Closed Manuel-Derrien closed 2 years ago
Hello Manuel,
I have seen your issue and will respond more in-depth on Thursday (7/14) as I am currently on vacation with family. Thank you for using my tool and I hope I can be of help. Sorry for the delay in response!
Hello Scott,
Don't worry about this delay and enjoy your vacation !
Hello Manuel,
For this issue please reference the scripts transposon/density_data.py
and examples/general_read_density_data.py
. The primary way you can import/read individual or multiple .h5 files is through the classmethod from_list_gene_data_and_hdf5_dir()
in transposon/density_data.py
. This is the function being used at the very end of general_read_density_data.py
. If you check out that class method in the code there should be more documentation, I will try to summarize below.
from_list_gene_data_and_hdf5_dir()
should be the way you read one, or multiple .h5 files. Currently I have no plans to implement a merge of the output data for each pseudomolecule. That is to say, when you read you read the data (initialize the DensityData class) you receive an instance of DensityData for each pseudomolecule in your dataset. The processed_dd_data
object at the end of general_read_density_data.py
is a Python list of DensityData
instances, each instance corresponds to one pseudomolecule of TE Density data.general_read_density_data.py
and from_list_gene_data_and_hdf5_dir()
functions over the coming days to include some of the explanations I have written above."Arabidopsis_(.*?).h5"
Hello Scott,
First, thank you for this complete Answer, the format of my file is like "paternal_scaffold_1" , "paternal_scaffold_2" ...
When I input python3 /mnt/c/Users/manu/Desktop/A._analyse/TE_density/TE_Density-master/general_read_density_data.py /mnt/c/Users/manu/Desktop/A._analyse/workspace_n1/Cleaned_paternal_imprinted.tsv /mnt/c/Users/manu/Desktop/A._analyse/results_paternal "paternal_scaffold_(.*?).h5"
I have this error message :
2022-07-15 10:07:08 laptop __main__[158] INFO Successfully imported the preprocessed gene annotation information: /mnt/c/Users/manu/Desktop/A._analyse/workspace_n1/Cleaned_paternal_imprinted.tsv Traceback (most recent call last): File "/mnt/c/Users/manu/Desktop/A._analyse/TE_density/TE_Density-master/general_read_density_data.py", line 70, in <module> processed_dd_data = DensityData.from_list_gene_data_and_hdf5_dir( File "/mnt/c/Users/manu/Desktop/A._analyse/TE_density/TE_Density-master/transposon/density_data.py", line 590, in from_list_gene_data_and_hdf5_dir [x.group(1) for x in chromosome_ids_unprocessed_h5_files] File "/mnt/c/Users/manu/Desktop/A._analyse/TE_density/TE_Density-master/transposon/density_data.py", line 590, in <listcomp> [x.group(1) for x in chromosome_ids_unprocessed_h5_files] AttributeError: 'NoneType' object has no attribute 'group'
Hi Manuel,
Please try doing this on a new branch I just uploaded: d/additional_user_help_density_data
I added some log info statements to provide more information during run time as to how the pseudomolecule ID is matching with the output .h5 file. That should help users see better if their regex is not producing the needed string to initialize DensityData.
And I think the regex pattern you need to supply is "paternal_(.*?).h5"
Alright, Thanks a lot !
Hello,
I'm trying to merge the .h5 file to output plots for the whole genome. I've seen your explanation on how output a single output file and try to read density but I don't really understand the regex format we need to input.
python examples/general_read_density_data.py CLEANED_GENE_ANNOTATION.tsv DENSITY_DATA_FOLDER "Arabidopsis_(.*?).h5"
Could you provide a more explicit way to input all the .h5 file at once or merge them. I especially don't understand the '"Arabidopsis_(.*?).h5"' part.Hope what I said is clear enough, feel free to ask more information about it.
All the best.