jodyphelan / TBProfiler

Profiling tool for Mycobacterium tuberculosis to detect ressistance and strain type from WGS data
GNU General Public License v3.0
105 stars 43 forks source link

ERROR MESSAGE #371

Open tuelomogashoa opened 4 months ago

tuelomogashoa commented 4 months ago

BTB001_S140.errlog.txt

Please help resolve this error. I have attached one f the errlog.txt files

jodyphelan commented 4 months ago

Did you install with conda? Can you try run the command delly on your commandline and paste the output here?

tuelomogashoa commented 4 months ago

Hi,

I am not sure I understand how to run the command delly, please clarify

Tuelo

On 24 Jun 2024, at 10:12, Jody Phelan @.***> wrote:

Can you try run the command delly on your commandline and paste the output here?

— Reply to this email directly, view it on GitHub https://github.com/jodyphelan/TBProfiler/issues/371#issuecomment-2185881852, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4LCKKCLUCQWGGZ5F5EJ5ZDZI7IHJAVCNFSM6AAAAABJZJY526VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBVHA4DCOBVGI. You are receiving this because you authored the thread.

Sent from Tuelo’s MacBook Pro @.***

tuelomogashoa commented 4 months ago

Did you install with conda? Can you try run the command delly on your commandline and paste the output here?

tb-profiler error report

jodyphelan commented 4 months ago

If you open up your terminal, type in delly and then hit enter.

Did you install with conda? I think delly is not working.

tuelomogashoa commented 4 months ago

Yes I installed with conda

On 24 Jun 2024, at 11:06, Jody Phelan @.***> wrote:

If you open up your terminal, type in delly and then hit enter.

Did you install with conda? I think delly is not working.

— Reply to this email directly, view it on GitHub https://github.com/jodyphelan/TBProfiler/issues/371#issuecomment-2185988868, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4LCKKA7IEX64PF4ZI3BBS3ZI7OSNAVCNFSM6AAAAABJZJY526VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBVHE4DQOBWHA. You are receiving this because you authored the thread.

Sent from Tuelo’s MacBook Pro @.***

tuelomogashoa commented 4 months ago

When I type delly I get an error that reads “Error while loading shared libraries: lib boost_iostreams.so.1.85.0: cannot open shared object file: No such file or directory”

I think this shows that delly is not working?

On 24 Jun 2024, at 11:06, Jody Phelan @.***> wrote:

If you open up your terminal, type in delly and then hit enter.

Did you install with conda? I think delly is not working.

— Reply to this email directly, view it on GitHub https://github.com/jodyphelan/TBProfiler/issues/371#issuecomment-2185988868, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4LCKKA7IEX64PF4ZI3BBS3ZI7OSNAVCNFSM6AAAAABJZJY526VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBVHE4DQOBWHA. You are receiving this because you authored the thread.

Sent from Tuelo’s MacBook Pro @.***

jodyphelan commented 4 months ago

I suspect the conda environment has not correctly been solved. You could download the conda environment file from here: https://raw.githubusercontent.com/jodyphelan/TBProfiler/dev/conda/linux-latest.txt to rebuild your environment. After downloading just run the following command to create a new environment with all the correct dependancies:

wget https://raw.githubusercontent.com/jodyphelan/TBProfiler/dev/conda/linux-latest.txt
conda create --name tb-profiler --file linux-latest.txt
conda activate tb-profiler
tb-profiler
jloubser commented 4 months ago

I apologise for chipping in here....getting some strange errors here @jodyphelan. Please see below. I just removed the openjdk line from the txt file but I don't think this is why we are getting these errors.

(base) [tuelomogashoa@khaos Final_TBP_run]$ mamba create --name tb-profiler --file linux-latest.txt

Downloading and Extracting Packages: pillow-10.0.1 | #########################5 | 19% /home/tuelomogashoa/miniforge3/lib/python3.10/site-packages/tqdm/std.py:636: TqdmWarning: clamping frac to range [0, 1]####################################3 | 98%

Error with archive /home/tuelomogashoa/miniforge3/pkgs/mash-2.3-ha9a2dd8_3.tar.bz2. You probably need to delete and re-download or re-create this file. Message was:

failed with error: Invalid data stream HTTP 404 NOT FOUND for url https://conda.anaconda.org/t/no-15126f85-4dce-4462-b41b1e2_0.conda Elapsed: 00:01.005817 CF-RAY: 898e4289d8690718-CPT

An HTTP error occurred when trying to retrieve this URL. HTTP errors are often intermittent, and a simple retry will get you on your way.

HTTP 404 NOT FOUND for url https://conda.anaconda.org/t/no-15126f85-4dce-4462-b41b1e2_0.conda Elapsed: 00:01.005817 CF-RAY: 898e4289d8690718-CPT

An HTTP error occurred when trying to retrieve this URL. HTTP errors are often intermittent, and a simple retry will get you on your way.

InvalidArchiveError('Error with archive /home/tuelomogashoa/miniforge3/pkgs/mash-2.3-ha9a2dd8_3.tar.bz2. You probably need to delete and re-download or re-create this file. Message was:\n\nfailed with error: Invalid data stream') CondaHTTPError: HTTP 404 NOT FOUND for url https://conda.anaconda.org/t/no-15126f85-4dce-4462-b41b1e2_0.conda Elapsed: 00:01.005817 CF-RAY: 898e4289d8690718-CPT

An HTTP error occurred when trying to retrieve this URL. HTTP errors are often intermittent, and a simple retry will get you on your way.

CondaHTTPError: HTTP 404 NOT FOUND for url https://conda.anaconda.org/t/no-15126f85-4dce-4462-b41b1e2_0.conda Elapsed: 00:01.005817 CF-RAY: 898e4289d8690718-CPT

An HTTP error occurred when trying to retrieve this URL. HTTP errors are often intermittent, and a simple retry will get you on your way.

jloubser commented 4 months ago

Some context for removing the openjdk line... We have seen that it installed a 'strange' internal Java build that has given me issues in the past. Would this openjdk version be suitable for the latest version of TB Profiler: (base) [tuelomogashoa@khaos Final_TBP_run]$ java -version openjdk version "11.0.23" 2024-04-16 LTS OpenJDK Runtime Environment (Red_Hat-11.0.23.0.9-2.el7_9) (build 11.0.23+9-LTS) OpenJDK 64-Bit Server VM (Red_Hat-11.0.23.0.9-2.el7_9) (build 11.0.23+9-LTS, mixed mode, sharing)

jloubser commented 4 months ago

I've restarted it all. Used micromamba. Got an error line for each of the package: warning libmamba Could not validate package '/home/tuelomogashoa/micromamba/pkgs/xorg-recordproto-1.14.2-h7f98852_1002/info/repodata_record.json': md5 and sha256 sum unknown. Final line: Set safety_checks to disabled to override this warning.

But, delly seems to be working...

jodyphelan commented 4 months ago

Some context for removing the openjdk line... We have seen that it installed a 'strange' internal Java build that has given me issues in the past. Would this openjdk version be suitable for the latest version of TB Profiler: (base) [tuelomogashoa@khaos Final_TBP_run]$ java -version openjdk version "11.0.23" 2024-04-16 LTS OpenJDK Runtime Environment (Red_Hat-11.0.23.0.9-2.el7_9) (build 11.0.23+9-LTS) OpenJDK 64-Bit Server VM (Red_Hat-11.0.23.0.9-2.el7_9) (build 11.0.23+9-LTS, mixed mode, sharing)

Yes I think that should be fine - as long as trimmomatic works then that should be ok.

I've restarted it all. Used micromamba. Got an error line for each of the package: warning libmamba Could not validate package '/home/tuelomogashoa/micromamba/pkgs/xorg-recordproto-1.14.2-h7f98852_1002/info/repodata_record.json': md5 and sha256 sum unknown. Final line: Set safety_checks to disabled to override this warning.

I'm not sure why this is happening but seems to be an error specific to the system rather than the environment file?

Now that delly works ok, does tb-profiler run ok also?

taranewman commented 4 months ago

Hi @jodyphelan

Just a note that we've encountered the same delly issue relating to libboost_iostreams.so.1.85.0 and know a few others that have as well for v6.2.0 and 6.2.1 : https://github.com/bioconda/bioconda-recipes/issues/48755

I did not experience this issue when I updated tbprofliler within an existing conda environment using the pypi channel.

@dfornika has worked to pin boost-cpp to v1.85.0 on the delly recipe, however we are still experiencing this issue.

This may get updated, but we are currently seeing this error in our pipeline checks here: https://github.com/BCCDC-PHL/tbprofiler-nf/pull/38/checks

dfornika commented 4 months ago

I hope this is helpful, I'm still a bit confused about what is going on with all of this dependency resolution. But at this point, if I try to create a conda env for TBProfiler:

conda create -n tb-profiler-6.2.1-test tb-profiler=6.2.1

...I'm still getting these builds of boost-cpp and delly:

image

that is:

boost-cpp          conda-forge/linux-64::boost-cpp-1.78.0-h2c5509c_4
delly              bioconda/linux-64::delly-1.2.6-h6dccd9a_2

...and if I try to specify the new build of delly that requires boost-cpp v1.85.0 as follows:

conda create -n tb-profiler-6.2.1-test tb-profiler=6.2.1 delly=1.2.6=hdcf5f25_3

...then I get this:

image

...saying that usher wants an older version of boost-cpp.

But I'm confused because the usher recipe doesn't pin to any specific version of boost-cpp:

https://github.com/bioconda/bioconda-recipes/blob/4b4544a052e8fa802a19d793e7cdcdef88cc55b8/recipes/usher/meta.yaml#L32

...so I don't understand why conda thinks it needs those version ranges. I'll try to ask around on the bioconda gitter to see if anyone there understands what's going wrong.

jodyphelan commented 4 months ago

Interesting, not too sure what's going on here either. For now I could pin a lower version of delly to the tb-profiler recipe as it seems to work ok when delly=v1.1.6 is used?

apetkau commented 4 months ago

Installation seems to work for me for delly 1.2.6 build hb7e2ac5_1, but not for build h6dccd9a_2, which is currently the default that is installed.

Installation that doesn't work

That is, if I install tb-profiler using the below:

conda create --name tb-profiler 'tb-profiler=6.2.1' -y

I will get:

conda list --name tb-profiler 'delly|boost-cpp'

# Name                    Version                   Build  Channel
boost-cpp                 1.78.0               h2c5509c_4    conda-forge
delly                     1.2.6                h6dccd9a_2    bioconda

This installation doesn't work, since delly is linked to libboost 1.85.0 libraries:

ldd envs/tb-profiler/bin/delly

...
        libboost_iostreams.so.1.85.0 => not found
        libboost_filesystem.so.1.85.0 => not found
        libboost_program_options.so.1.85.0 => not found

Installation that is working

But, if I instead install tb-profiler and specify an older build of delly:

conda create --name tb-profiler 'tb-profiler=6.2.1' 'delly=1.2.6=hb7e2ac5_1' -y

I get:

conda list --name tb-profiler 'delly|boost-cpp'

# Name                    Version                   Build  Channel
boost-cpp                 1.78.0               h2c5509c_4    conda-forge
delly                     1.2.6                hb7e2ac5_1    bioconda

This installation does work, since the older build of delly seems linked to the boost 1.78.0 libraries:

ldd tb-profiler/bin/delly

...
        libboost_iostreams.so.1.78.0 => [...]/envs/tb-profiler/bin/../lib/libboost_iostreams.so.1.78.0 (0x00007ff211d8d000)
        libboost_filesystem.so.1.78.0 => [...]/envs/tb-profiler/bin/../lib/libboost_filesystem.so.1.78.0 (0x00007ff211d6d000)
        libboost_program_options.so.1.78.0 => [...]/envs/tb-profiler/bin/../lib/libboost_program_options.so.1.78.0 (0x00007ff211d1b000)

So the difference between delly 1.2.6 build hb7e2ac5_1 and h6dccd9a_2 must have been building against different versions of boost.

dfornika commented 4 months ago

Thanks for doing those tests @apetkau, I hadn't found a way to get a successful dependency resolution and installation.

Based on what I'm seeing, I do think targeting an earlier version of delly (one that uses boost v1.78.0) would make sense. It looks like we need a way to get usher and delly aligned on which version of boost they want when they come together in the tb-profiler environment. I think technically the place to do that is probably the pathogen-profiler recipe because that's where delly is listed as a dependency.

I really hope I haven't made things worse by pinning the delly recipe against boost v1.85.0 or later. Please let me know if you think that was a bad move. I don't have much experience with C++ or the details of how linking is done with .so files. So it's not entirely clear to me how flexible delly is with which versions of boost that it's compatible with. But my error logs were saying that delly wanted boost v1.85.0, and my conda environment had boost v1.78.0, so it seemed sensible to me to pin boost to v1.85.0 or later on the delly recipe.

jodyphelan commented 4 months ago

Thanks for all the help debugging this. I think for now I'll specify delly <=1.1.6 in the pathogen-profiler recipe.

jloubser commented 4 months ago

@jodyphelan all seems to working now, yes. Thank you!

jodyphelan commented 4 months ago

I might have to review this again as I came across what looks like a bug in delly v1.1.6 which seems to be fixed in v.1.2.6. There is a large deletion in SRR8651614 which is has different end coordinates based on which version you use. I looked more closely and it looks like v1.2.6 produces the right coordinates.

(pp) jody@s10:~/temp$ bcftools view 116.bcf | snpEff ann Mycobacterium_tuberculosis_h37rv | grep 2288545
Chromosome  2288545 DEL00000208 G   <DEL>   1200.0  PASS    PRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.6;END=2288732;PE=0;MAPQ=0;CT=3to5;CIPOS=-3,3;CIEND=-3,3;SRMAPQ=60;INSLEN=0;HOMLEN=4;SR=20;SRQ=1;CONSENSUS=AGTTCGAGATCGCGCAGCACCACCGTGCCGGAGACGATATCCAGATCGCGATGGAACGTGATATCCCGCGGCCCGATGAAGGTGTCGTAGAAGCGGCCGATGGCCTCATGCCCCACCTGCGGGCGACGGTGGTATCGGCCGACACACCCGCTGTCAGGTCCACCAGCACCCTGG;CE=1.92873;ANN=<DEL>|gene_fusion|HIGH|Rv2042c&pncA|Rv2042c&Rv2043c|gene_variant|Rv2042c|||n.2288732_2288546del||||||,<DEL>|frameshift_variant&start_lost|HIGH|Rv2042c|Rv2042c|transcript|CCP44815|protein_coding|1/1|c.-51_136del|p.Met1fs|136/798|1/798|1/265||,<DEL>|frameshift_variant&stop_lost&splice_region_variant|HIGH|pncA|Rv2043c|transcript|CCP44816|protein_coding|1/1|c.510_*135del|p.Ala170fs||510/561|170/186||,<DEL>|upstream_gene_variant|MODIFIER|Rv2037c|Rv2037c|transcript|CCP44810|protein_coding||c.-5011_-4825del|||||4825|WARNING_TRANSCRIPT_NO_START_CODON,<DEL>|upstream_gene_variant|MODIFIER|Rv2038c|Rv2038c|transcript|CCP44811|protein_coding||c.-3936_-3750del|||||3750|,<DEL>|upstream_gene_variant|MODIFIER|Rv2039c|Rv2039c|transcript|CCP44812|protein_coding||c.-3091_-2905del|||||2905|WARNING_TRANSCRIPT_NO_START_CODON,<DEL>|upstream_gene_variant|MODIFIER|Rv2040c|Rv2040c|transcript|CCP44813|protein_coding||c.-2202_-2016del|||||2016|,<DEL>|upstream_gene_variant|MODIFIER|Rv2041c|Rv2041c|transcript|CCP44814|protein_coding||c.-886_-700del|||||700|,<DEL>|upstream_gene_variant|MODIFIER|Rv2042c|Rv2042c|transcript|CCP44815|protein_coding|1/1|c.-51_136del|||||0|,<DEL>|upstream_gene_variant|MODIFIER|lppI|Rv2046|transcript|CCP44819|protein_coding||c.-2723_-2537del|||||2723|,<DEL>|downstream_gene_variant|MODIFIER|pncA|Rv2043c|transcript|CCP44816|protein_coding|1/1|c.510_*135del|||||135|,<DEL>|downstream_gene_variant|MODIFIER|Rv2044c|Rv2044c|transcript|CCP44817|protein_coding||c.*550_*736del|||||736|,<DEL>|downstream_gene_variant|MODIFIER|lipT|Rv2045c|transcript|CCP44818|protein_coding||c.*953_*1139del|||||1139|WARNING_TRANSCRIPT_NO_START_CODON,<DEL>|downstream_gene_variant|MODIFIER|Rv2047c|Rv2047c|transcript|CCP44820|protein_coding||c.*3230_*3416del|||||3416|WARNING_TRANSCRIPT_NO_START_CODON;LOF=(Rv2042c|Rv2042c|1|1.00),(pncA|Rv2043c|1|1.00) GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-851.565,-64.5226,0:10000:PASS:30696:1224:26835:0:0:0:3:247
(pp) jody@s10:~/temp$ bcftools view 126.bcf | snpEff ann Mycobacterium_tuberculosis_h37rv | grep 2288545
Chromosome  2288545 DEL00000422 GGCTGCGAACCCACCGGGTCTTCGACCCGCGCGTCACCGGTGAACAACCCGACCCAGCCGGCGCGGTCGTGCGCGGCGGCCGCTTGCGGCGAGCGCTCCACCGCCGCCAACAGTTCATCCCGGTTCGGCGGTGCCATCAGGAGCTGCAAACCAACTCGACGCTGGCGGTGCGCATCTCCTCCAGCGC G   1200.0  PASS    PRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.2.6;END=2288731;PE=0;MAPQ=0;CT=3to5;CIPOS=-3,3;CIEND=-3,3;SRMAPQ=60;INSLEN=0;HOMLEN=4;SR=20;SRQ=1;CONSENSUS=TCGAGATCGCGCAGCACCACCGTGCCGGAGACGATATCCAGATCGCGATGGAACGTGATATCCCGCGGCCCGATGAAGGTGTCGTAGAAGCGGCCGATGGCCTCATGCCCCACCTGCGGGCGACGGTGGTATCGGCCGACACACCCGCTGTCAGGTCCACCAGCACCCTGG;CE=1.92313;CONSBP=118;ANN=G|gene_fusion|HIGH|Rv2042c&pncA|Rv2042c&Rv2043c|gene_variant|Rv2042c|||n.2288731_2288546del||||||,G|frameshift_variant&start_lost|HIGH|Rv2042c|Rv2042c|transcript|CCP44815|protein_coding|1/1|c.-50_136del|p.Met1fs|136/798|1/798|1/265||,G|stop_lost&conservative_inframe_deletion&splice_region_variant|HIGH|pncA|Rv2043c|transcript|CCP44816|protein_coding|1/1|c.511_*135del|p.Ala171_Ter187del||511/561|171/186||,G|upstream_gene_variant|MODIFIER|Rv2037c|Rv2037c|transcript|CCP44810|protein_coding||c.-5010_-4825del|||||4825|WARNING_TRANSCRIPT_NO_START_CODON,G|upstream_gene_variant|MODIFIER|Rv2038c|Rv2038c|transcript|CCP44811|protein_coding||c.-3935_-3750del|||||3750|,G|upstream_gene_variant|MODIFIER|Rv2039c|Rv2039c|transcript|CCP44812|protein_coding||c.-3090_-2905del|||||2905|WARNING_TRANSCRIPT_NO_START_CODON,G|upstream_gene_variant|MODIFIER|Rv2040c|Rv2040c|transcript|CCP44813|protein_coding||c.-2201_-2016del|||||2016|,G|upstream_gene_variant|MODIFIER|Rv2041c|Rv2041c|transcript|CCP44814|protein_coding||c.-885_-700del|||||700|,G|upstream_gene_variant|MODIFIER|Rv2042c|Rv2042c|transcript|CCP44815|protein_coding|1/1|c.-50_136del|||||0|,G|upstream_gene_variant|MODIFIER|lppI|Rv2046|transcript|CCP44819|protein_coding||c.-2723_-2538del|||||2723|,G|downstream_gene_variant|MODIFIER|pncA|Rv2043c|transcript|CCP44816|protein_coding|1/1|c.511_*135del|||||135|,G|downstream_gene_variant|MODIFIER|Rv2044c|Rv2044c|transcript|CCP44817|protein_coding||c.*551_*736del|||||736|,G|downstream_gene_variant|MODIFIER|lipT|Rv2045c|transcript|CCP44818|protein_coding||c.*954_*1139del|||||1139|WARNING_TRANSCRIPT_NO_START_CODON,G|downstream_gene_variant|MODIFIER|Rv2047c|Rv2047c|transcript|CCP44820|protein_coding||c.*3231_*3416del|||||3416|WARNING_TRANSCRIPT_NO_START_CODON;LOF=(Rv2042c|Rv2042c|1|1.00)    GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-851.565,-64.5226,0:10000:PASS:30696:1224:26835:0:0:0:3:247

It looks like v1.1.6 ends one nucleotide after the actual deletion end and this has a knock on effect on the snpEff annotation - frameshift vs inframe mutation, which will lead to a different drug resistance prediction to pyrazinamide. Though rare I don't think we should be compromising the resistance predictions just so we can have usher there.

Not really sure how to proceed, in the short term I could remove usher?