viralInformatics / VIGA

13 stars 1 forks source link

Launch problem #8

Open SergeyBaikal opened 1 month ago

SergeyBaikal commented 1 month ago

Dear Authors. I have not been able to launch it.

  1. The Trinity sees only left reads, it seems that only left reads are formed in fastp.
  2. As far as I understand databases I can use my own, i.e. I already have RefSeq and NR (diamond). I used the conda environment to install the old version of python.
  3. In addition, I do not see test fastq files in the "test" directory.
  4. And one more question. I don't see a description for the scripts 0_runall.py, count.py . What are these scripts doing?
  5. Why is there a limit on the number of thread used? ERROR: thread number (--thread) should be 1 ~ 16, suggest 1 ~ 8

git clone https://github.com/viralInformatics/VIGA.git conda config --set channel_priority disabled conda create -n VIGA python=3.6.8 conda activate VIGA

conda install -c bioconda fastp=0.12.4 conda install -c bioconda trinity=2.8.5 conda install -c bioconda ragtag=2.1.0 conda install -c bioconda quast=5.0.2 conda install -c bioconda diamond # I was unable to install the version diamond=2.0.11.149 conda install -c bioconda samtools # The output of the program required installation SAMtools

pip install pandas==1.1.5 numpy==1.19.5 matplotlib==3.3.4 biopython==1.79 log.txt

viralInformatics commented 1 month ago

Hello,

Regarding your questions:

  1. Trinity only sees left reads, and fastp seems to only produce left reads.

    I reviewed your log file and found that when running Trinity, the following issue occurred: Error, not recognizing read name formatting: [M01026:19:000000000-BNDV5:1:1101:1948:14599] You can refer to this answer: https://github.com/trinityrnaseq/trinityrnaseq/issues/364

  2. Using your own databases (RefSeq and NR for DIAMOND).

    Yes, you can definitely use your own databases. If you already have RefSeq and NR databases formatted for DIAMOND, you can specify their paths in your analysis. This allows you to work with familiar datasets and ensures consistency with your previous work.

    Any Python version greater than 3 will work. Installing an older version of Python via conda is a good approach to ensure compatibility with scripts that require it. Just make sure that all the dependencies and packages you need are compatible with that Python version.

  3. Missing test FASTQ files in the "test" directory. Since we have provided the sample ID for the test, you can download the raw files from the ENA database:

Therefore, we did not upload them.

  1. No description for the scripts 0_runall.py and count.py. If it's paired-end sequencing, the 0_runall.py script can run both virus identification and virus assembly steps . The count.py script generates a consensus sequence by calculating the nucleotide frequency at each position on the reference genome from the BAM file. This result can be used as a reference for our assembled genome. For example: 1728620222710

The resulting consensus sequence (True): CTCCT

  1. Limit on the number of threads used: ERROR: thread number (--thread) should be 1 ~ 16, suggest 1 ~ 8

    Because in our integrated methods, some tools recommend using fewer than 20 threads—for example, fastp—so we have limited the number of threads accordingly.

I hope this addresses your concerns. Let me know if you need further assistance!

Best regards, pingfu

SergeyBaikal commented 1 month ago

Thanks so much for the detailed answers!

1. Trinity only sees left reads, and fastp seems to only produce left reads. Indeed it is. It's all working now. I pasted the solution here in case someone has the same problem. Incoming fastq files need to be formatted a little bit.

awk '{if (NR%4 == 1) print $1 "/1"; else print $0;}' reads_1.fastq > reads_1.fastq.corrected
awk '{if (NR%4 == 1) print $1 "/2"; else print $0;}' reads_2.fastq > reads_2.fastq.corrected

https://github.com/trinityrnaseq/trinityrnaseq/issues/364#issuecomment-600389854

viralInformatics commented 1 month ago

Thanks so much for the detailed answers!

1. Trinity only sees left reads, and fastp seems to only produce left reads. Indeed it is. It's all working now. I pasted the solution here in case someone has the same problem. Incoming fastq files need to be formatted a little bit.

awk '{if (NR%4 == 1) print $1 "/1"; else print $0;}' reads_1.fastq > reads_1.fastq.corrected
awk '{if (NR%4 == 1) print $1 "/2"; else print $0;}' reads_2.fastq > reads_2.fastq.corrected

trinityrnaseq/trinityrnaseq#364 (comment)

Thank you!