sjteresi / TE_Density

Python script calculating transposable element density for all genes in a genome. Publication: https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-022-00264-4
GNU General Public License v3.0
30 stars 4 forks source link

Issue on the merge of all chromosome in the post processing part. #109

Closed Manuel-Derrien closed 2 years ago

Manuel-Derrien commented 2 years ago

Hello,

I'm trying to merge the .h5 file to output plots for the whole genome. I've seen your explanation on how output a single output file and try to read density but I don't really understand the regex format we need to input. python examples/general_read_density_data.py CLEANED_GENE_ANNOTATION.tsv DENSITY_DATA_FOLDER "Arabidopsis_(.*?).h5" Could you provide a more explicit way to input all the .h5 file at once or merge them. I especially don't understand the '"Arabidopsis_(.*?).h5"' part.

Hope what I said is clear enough, feel free to ask more information about it.

All the best.

sjteresi commented 2 years ago

Hello Manuel,

I have seen your issue and will respond more in-depth on Thursday (7/14) as I am currently on vacation with family. Thank you for using my tool and I hope I can be of help. Sorry for the delay in response!

Manuel-Derrien commented 2 years ago

Hello Scott,

Don't worry about this delay and enjoy your vacation !

sjteresi commented 2 years ago

Hello Manuel,

For this issue please reference the scripts transposon/density_data.py and examples/general_read_density_data.py. The primary way you can import/read individual or multiple .h5 files is through the classmethod from_list_gene_data_and_hdf5_dir() in transposon/density_data.py. This is the function being used at the very end of general_read_density_data.py. If you check out that class method in the code there should be more documentation, I will try to summarize below.

Manuel-Derrien commented 2 years ago

Hello Scott,

First, thank you for this complete Answer, the format of my file is like "paternal_scaffold_1" , "paternal_scaffold_2" ... When I input python3 /mnt/c/Users/manu/Desktop/A._analyse/TE_density/TE_Density-master/general_read_density_data.py /mnt/c/Users/manu/Desktop/A._analyse/workspace_n1/Cleaned_paternal_imprinted.tsv /mnt/c/Users/manu/Desktop/A._analyse/results_paternal "paternal_scaffold_(.*?).h5" I have this error message : 2022-07-15 10:07:08 laptop __main__[158] INFO Successfully imported the preprocessed gene annotation information: /mnt/c/Users/manu/Desktop/A._analyse/workspace_n1/Cleaned_paternal_imprinted.tsv Traceback (most recent call last): File "/mnt/c/Users/manu/Desktop/A._analyse/TE_density/TE_Density-master/general_read_density_data.py", line 70, in <module> processed_dd_data = DensityData.from_list_gene_data_and_hdf5_dir( File "/mnt/c/Users/manu/Desktop/A._analyse/TE_density/TE_Density-master/transposon/density_data.py", line 590, in from_list_gene_data_and_hdf5_dir [x.group(1) for x in chromosome_ids_unprocessed_h5_files] File "/mnt/c/Users/manu/Desktop/A._analyse/TE_density/TE_Density-master/transposon/density_data.py", line 590, in <listcomp> [x.group(1) for x in chromosome_ids_unprocessed_h5_files] AttributeError: 'NoneType' object has no attribute 'group'

sjteresi commented 2 years ago

Hi Manuel,

Please try doing this on a new branch I just uploaded: d/additional_user_help_density_data

I added some log info statements to provide more information during run time as to how the pseudomolecule ID is matching with the output .h5 file. That should help users see better if their regex is not producing the needed string to initialize DensityData.

And I think the regex pattern you need to supply is "paternal_(.*?).h5"

Manuel-Derrien commented 2 years ago

Alright, Thanks a lot !