Closed zmz1988 closed 1 year ago
In case you would like to see the packages
(TE_density) $ conda list --name TE_density
# packages in environment at /miniconda3/envs/TE_density:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
attrs 22.1.0 pypi_0 pypi
black 22.10.0 pypi_0 pypi
bzip2 1.0.8 h7f98852_4 conda-forge
ca-certificates 2022.12.7 ha878542_0 conda-forge
click 8.1.3 pypi_0 pypi
coloredlogs 15.0.1 pypi_0 pypi
contourpy 1.0.6 pypi_0 pypi
cycler 0.11.0 pypi_0 pypi
exceptiongroup 1.0.4 pypi_0 pypi
fonttools 4.38.0 pypi_0 pypi
h5py 3.7.0 pypi_0 pypi
humanfriendly 10.0 pypi_0 pypi
iniconfig 1.1.1 pypi_0 pypi
kiwisolver 1.4.4 pypi_0 pypi
ld_impl_linux-64 2.40 h41732ed_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 12.2.0 h65d4601_19 conda-forge
libgomp 12.2.0 h65d4601_19 conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libsqlite 3.40.0 h753d276_0 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libzlib 1.2.13 h166bdaf_4 conda-forge
matplotlib 3.6.2 pypi_0 pypi
mypy-extensions 0.4.3 pypi_0 pypi
ncurses 6.3 h27087fc_1 conda-forge
numexpr 2.8.4 pypi_0 pypi
numpy 1.23.5 pypi_0 pypi
openssl 3.1.0 h0b41bf4_0 conda-forge
packaging 21.3 pypi_0 pypi
pandas 1.5.2 pypi_0 pypi
pathspec 0.10.2 pypi_0 pypi
pillow 9.3.0 pypi_0 pypi
pip 23.0.1 pyhd8ed1ab_0 conda-forge
platformdirs 2.5.4 pypi_0 pypi
pluggy 1.0.0 pypi_0 pypi
pyparsing 3.0.9 pypi_0 pypi
pytest 7.2.0 pypi_0 pypi
python 3.10.9 he550d4f_0_cpython conda-forge
python-dateutil 2.8.2 pypi_0 pypi
pytz 2022.6 pypi_0 pypi
readline 8.1.2 h0f457ee_0 conda-forge
scipy 1.9.3 pypi_0 pypi
setuptools 67.6.0 pyhd8ed1ab_0 conda-forge
six 1.16.0 pypi_0 pypi
tables 3.7.0 pypi_0 pypi
tk 8.6.12 h27826a3_0 conda-forge
tomli 2.0.1 pypi_0 pypi
tqdm 4.64.1 pypi_0 pypi
typing-extensions 3.7.4.3 pypi_0 pypi
tzdata 2022g h191b570_0 conda-forge
wcwidth 0.1.7 pypi_0 pypi
wheel 0.40.0 pyhd8ed1ab_0 conda-forge
wrapt 1.11.2 pypi_0 pypi
xz 5.2.6 h166bdaf_0 conda-forge
Hi, Thank you for the detailed report.
The first error with check nulls arises from Python not knowing where to search for modules. I suggest you look at section 6.1.2 here on the python documentation pertaining to module and PYTHONPATH.
Second, can you please try running the system test? If you are on the most recent version of the master branch, you should be able to run make system_test
. If that works we should be able to move forward easily from there. I suspect that it is either a package issue related to installing from conda, or an h5 file that wasn't properly written from a previously failed run (and it is trying to be re-opened). In the case of the latter, I would suggest deleting all H5 files and starting once more.
Did the revision step work OK?
Thanks for replying me so quick! Yes, you're right! It's the old failed H5 files that caused the problems. So I deleted the whole output folder, and ran again. The run was successful! Thanks a lot!
Also, I manage to make the check_nulls work as well, by adding sys.path.append('/absolute_path_to_github_downloads/TE_Density/')
before the imports in the helper script. Just share it here in case other python beginners like me don't know how to solve it.
Thanks again! :)
sorry, it's me again. I hope it's ok to ask another question here?
I ran into this unmatching problem of psudoumolecule names between H5 files and the GeneData file, but I don't really see why they don't match. Could you give me some advice please? Thanks a lot in advance!
(TE_density) $ python /TE_Density/examples/general_read_density_data.py Cleaned_Co_new_annotation_clean1_cleanTag.tsv Co_TE_density "Co_genome_(.*?).h5"
2023-03-17 11:39:22 baddiel.local __main__[12346] INFO import of preprocessed gene annotation... success!
2023-03-17 11:39:22 baddiel.local __main__[12346] INFO
Using the user's provided regex string 'Co_genome_(.*?).h5' to match file
objects and identify the proper pseudomolecule group for
each file. Regex group 1 of this string must correspond to
a pseudomolecule. This is needed to initialize DensityData.
The user should verify that the pseudomolecule IDs derived from
the GeneData correspond to the groups derived from the filename
of the output .h5 data.
2023-03-17 11:39:22 baddiel.local __main__[12346] INFO Pseudomolecule from GeneData is Chr1_RagTag_polished, Regex group 1 of <re.Match object; span=(77, 111), match='Co_genome_Chr1_RagTag_polished.h5'> is Chr1_RagTag_polished
2023-03-17 11:39:22 baddiel.local __main__[12346] INFO Pseudomolecule from GeneData is Chr2_RagTag_polished, Regex group 1 of <re.Match object; span=(77, 111), match='Co_genome_Chr2_RagTag_polished.h5'> is Chr2_RagTag_polished
2023-03-17 11:39:22 baddiel.local __main__[12346] INFO Pseudomolecule from GeneData is Chr3_RagTag_polished, Regex group 1 of <re.Match object; span=(77, 111), match='Co_genome_Chr3_RagTag_polished.h5'> is Chr3_RagTag_polished
2023-03-17 11:39:22 baddiel.local __main__[12346] INFO Pseudomolecule from GeneData is Chr4_RagTag_polished, Regex group 1 of <re.Match object; span=(77, 111), match='Co_genome_Chr4_RagTag_polished.h5'> is Chr4_RagTag_polished
2023-03-17 11:39:22 baddiel.local __main__[12346] INFO Pseudomolecule from GeneData is Chr5_RagTag_polished, Regex group 1 of <re.Match object; span=(77, 111), match='Co_genome_Chr5_RagTag_polished.h5'> is Chr5_RagTag_polished
2023-03-17 11:39:22 baddiel.local __main__[12346] INFO Pseudomolecule from GeneData is ChrC_RagTag_polished, Regex group 1 of <re.Match object; span=(77, 111), match='Co_genome_ChrC_RagTag_polished.h5'> is ChrC_RagTag_polished
2023-03-17 11:39:22 baddiel.local __main__[12346] INFO Pseudomolecule from GeneData is ChrM_RagTag_polished, Regex group 1 of <re.Match object; span=(77, 111), match='Co_genome_ChrM_RagTag_polished.h5'> is ChrM_RagTag_polished
2023-03-17 11:39:22 baddiel.local __main__[12346] CRITICAL The strings of chromosomes in your unprocessed
hdf5 files: ['Chr1_RagTag_polished', 'Chr1_RagTag_polished_GeneData', 'Chr1_RagTag_polished_TEData', 'Chr1_RagTag_polished_overlap', 'Chr2_RagTag_polished', 'Chr2_RagTag_polished_GeneData', 'Chr2_RagTag_polished_TEData', 'Chr2_RagTag_polished_overlap', 'Chr3_RagTag_polished', 'Chr3_RagTag_polished_GeneData', 'Chr3_RagTag_polished_TEData', 'Chr3_RagTag_polished_overlap', 'Chr4_RagTag_polished', 'Chr4_RagTag_polished_GeneData', 'Chr4_RagTag_polished_TEData', 'Chr4_RagTag_polished_overlap', 'Chr5_RagTag_polished', 'Chr5_RagTag_polished_GeneData', 'Chr5_RagTag_polished_TEData', 'Chr5_RagTag_polished_overlap', 'ChrC_RagTag_polished', 'ChrC_RagTag_polished_GeneData', 'ChrC_RagTag_polished_TEData', 'ChrC_RagTag_polished_overlap', 'ChrM_RagTag_polished', 'ChrM_RagTag_polished_GeneData', 'ChrM_RagTag_polished_TEData', 'ChrM_RagTag_polished_overlap'], identified using your supplied
regex pattern: 'Co_genome_(.*?).h5', do not match the
chromosomes in the GeneData: ['Chr1_RagTag_polished', 'Chr2_RagTag_polished', 'Chr3_RagTag_polished', 'Chr4_RagTag_polished', 'Chr5_RagTag_polished', 'ChrC_RagTag_polished', 'ChrM_RagTag_polished'].
Traceback (most recent call last):
File "/TE_Density/examples/general_read_density_data.py", line 80, in <module>
processed_dd_data = DensityData.from_list_gene_data_and_hdf5_dir(
File "/TE_Density/transposon/density_data.py", line 666, in from_list_gene_data_and_hdf5_dir
raise ValueError
ValueError
Hi,
Since it is looking for any .h5
files and it seems to be recognizing the TE Data and Gene Data files, I would suggest you place only the density data output files (not the gene, overlap, or TE data files) in a directory of their own. And then try re-running. I think that should fix it.
If it does or doesn't please let me know. I'll make a note to improve the error message for a future release.
Yes, it's fixed now as you suggested. I previously ran general_read_density_data.py
directly on the output folder of process_genome.py
, and thought that there should be no other files in the folder than the outputted density data .h5 (but of course there are other .h5 files in the filtered_input_data and the tmp folder). Now I created a new directory and set the links to only the density output .h5 file, then everything runs smoothly.
So embarrassing that I didn't think to check the tmp folder and filtered_input_data, otherwise I could solve this by myself. Thanks again for your kind help!
It is totally OK! I'm glad I could help. It is useful to see where people get stuck, so I can write better error messages.
Good luck with your project!
Thanks a lot!
Hello, thanks a lot for providing this nice package! I recently tried it, but got the error as below. I can't figure out where could be wrong. If you could point me out where to check, I would be really appreciate it!
My input files are generated by using your helper scripts
import_Arabidopsis_gene_anno.py
andimport_Arabidopsis_EDTA.py
, as my TE file and the gene annotation file fit in how you parse the file in the help scripts.Then I used the suggested command
python /home/TE_Density/process_genome.py Cleaned_Co_new_annotation_clean1_cleanTag.tsv Cleaned_Co.fasta.mod.EDTA.TEanno.gff3.tsv Co_genome -c /home/TE_Density/config/production_run_config.ini -n 5 -o Co_TE_density
to perform the TE density analysis. However, I got errors as below:(omit some parts of the screen output, which is generated by using
--reset_h5
after the first failed trial)