liaoherui / StrainScan

High-resolution strain-level microbiome composition analysis tool based on reference genomes and k-mers
https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-023-01615-w
MIT License
38 stars 5 forks source link

IndexError: list index out of range #13

Closed wangqi0000 closed 10 months ago

wangqi0000 commented 12 months ago

Hello, when I use the command 1 to identify strains, command 1: python StrainScan.py -i E.coli-3lr.fastq -d ./download_database/DB_Ecoli -b 1 -o ./test I got the errors below: image I don't know what's wrong

By the way, the code that I install StrainScan is:

git clone https://github.com/liaoherui/StrainScan.git
cd StrainScan
conda env create -f environment_candidate.yaml
conda activate strainscan
chmod 755 library/jellyfish-linux
chmod 755 library/dashing_s128

Thanks for any feedback you might have!

liaoherui commented 12 months ago

Hi, thanks for using StrainScan.

It seems this error could be caused by a potential bug due to the identification of low-depth strains. Would you mind sending your test data to us for debugging? Then, we will check the reason and give the solution asap. Thanks!

wangqi0000 commented 12 months ago

The test data "E.coli-3lr.fastq" is already on Baidu Netdisk.

Baidu Netdisk link: https://pan.baidu.com/s/131vdOx97DERIJwYnK2lZhQ Extraction code: khdy

Thanks!!

liaoherui commented 11 months ago

Hi, sorry for late reply.

We have fixed the bug in the latest GitHub version. Please re-install StrainScan from the GitHub. And apply it to your data to see whether it works (we have tested the program with the data you provided, and it runs successfully). Thanks!

wangqi0000 commented 11 months ago

Hi, sorry for late reply.

We have fixed the bug in the latest GitHub version. Please re-install StrainScan from the GitHub. And apply it to your data to see whether it works (we have tested the program with the data you provided, and it runs successfully). Thanks!

Thanks for your reply! After re-installing StrainScan, it runs successfully on the test data "Ecoli-3lr.fastq"! running command: python StrainScan.py -i E.coli-3lr.fastq -d ./download_database/DB_Ecoli -b 1 -o ./test

However, I have another problem. After running strainscan, I got a warning: "Warning: No clusters can be detected!" (even I have used the parameter "-b 1") and only got one output file named "strain_prob.txt" as follows: image

That means there is no strain identified, or did I run it wrong again?

liaoherui commented 11 months ago

Hi,

I have reviewed your output, and it appears to be reasonable. When you use -b 1, this means there could be low-depth strain(s) in your input data. Thus, it's normal to detect no strain clusters. However, when you use the -b 1 option, the generated "strain_prob.txt" file provides the probabilities of strain clusters existing in your input data. In the screenshot you shared, C33 has a probability of approximately 75%, suggesting that this cluster could potentially be present in your input data. The final column of this file refers to the strains associated with this cluster. Usually, the higher the probability, the greater the likelihood of the cluster's presence in your input data.

I hope this explanation clarifies the output for you. If you have any further questions or encounter any other issues, please don't hesitate to ask.

wangqi0000 commented 11 months ago

Thanks for your explanation! That helps me a lot!

wangqi0000 commented 11 months ago

Hi,

I have reviewed your output, and it appears to be reasonable. When you use -b 1, this means there could be low-depth strain(s) in your input data. Thus, it's normal to detect no strain clusters. However, when you use the -b 1 option, the generated "strain_prob.txt" file provides the probabilities of strain clusters existing in your input data. In the screenshot you shared, C33 has a probability of approximately 75%, suggesting that this cluster could potentially be present in your input data. The final column of this file refers to the strains associated with this cluster. Usually, the higher the probability, the greater the likelihood of the cluster's presence in your input data.

I hope this explanation clarifies the output for you. If you have any further questions or encounter any other issues, please don't hesitate to ask.

Thanks for your explanation! That helps me a lot!

wangqi0000 commented 11 months ago

Hi, I have reviewed your output, and it appears to be reasonable. When you use -b 1, this means there could be low-depth strain(s) in your input data. Thus, it's normal to detect no strain clusters. However, when you use the -b 1 option, the generated "strain_prob.txt" file provides the probabilities of strain clusters existing in your input data. In the screenshot you shared, C33 has a probability of approximately 75%, suggesting that this cluster could potentially be present in your input data. The final column of this file refers to the strains associated with this cluster. Usually, the higher the probability, the greater the likelihood of the cluster's presence in your input data. I hope this explanation clarifies the output for you. If you have any further questions or encounter any other issues, please don't hesitate to ask.

Thanks for your explanation! That helps me a lot!

By the way, can I specify the name of the output file? I only see that I can specify the directory where the output file is located

Hi,

I have reviewed your output, and it appears to be reasonable. When you use -b 1, this means there could be low-depth strain(s) in your input data. Thus, it's normal to detect no strain clusters. However, when you use the -b 1 option, the generated "strain_prob.txt" file provides the probabilities of strain clusters existing in your input data. In the screenshot you shared, C33 has a probability of approximately 75%, suggesting that this cluster could potentially be present in your input data. The final column of this file refers to the strains associated with this cluster. Usually, the higher the probability, the greater the likelihood of the cluster's presence in your input data.

I hope this explanation clarifies the output for you. If you have any further questions or encounter any other issues, please don't hesitate to ask.

By the way, can I specify the name of the output file? I only find that I can specify the directory where the output file is located Thanks!!

liaoherui commented 11 months ago

Currently, we don't have that function. But a Python script can achieve that. If you need that function, please email me (heruiliao2-c@my.cityu.edu.hk). Then I can help you with that.