xinehc / args_oap

ARGs-OAP: Online Analysis Pipeline for Antibiotic Resistance Genes Detection from Metagenomic Data Using an Integrated Structured ARG Database
MIT License
43 stars 11 forks source link

Regarding error related to fastq files #64

Open chanchalrana opened 6 months ago

chanchalrana commented 6 months ago

Hello!! I am running args_oap on fastq files and in stage one I am getting the error as: image (1)

Kindly tell me the solution for this. I am getting it for all the files.

chanchalrana commented 6 months ago

When I used the new version of ARGS-OAP, I am getting the error as: image

It would be great if you can help me out with this issue.

xinehc commented 6 months ago

Your fastq file is truncated/incomplete somehow. Did you by any chance concatenated a fastq with a fasta file?

The + line is missing in line 29480427, please double check the file.

chanchalrana commented 6 months ago

The file is metagenomic fastq file in gzip format. I checked that there is + sign present at 29480427. These are infant stool samples. I cannot figure out the problem image

xinehc commented 6 months ago

I would recommand to run seqkit sana on your fastq file to see whether there are malformed records.

Reference: https://bioinf.shenwei.me/seqkit/usage/#sana

chanchalrana commented 6 months ago

Yeah!! I used it. It is giving no errors. image image What else I can do to find more about this error.

When I run the files, extracted.fa file is generated but metadata file remains empty.

xinehc commented 6 months ago

Did you check file <input/HeP-1057-162.fa>? Although its named after .fa, diamond identifies it as a fq file. Based on your screenshot, the file indeed should be a fq file, so my guess is that this particularly file is either truncated or contains a mixture of fq/fa records. Something could wrong with upstream preprocessing. Why it is named after .fa?

chanchalrana commented 6 months ago

Yup, earlier I changed the extension of fq.gz files after extraction to .fa. But then I run the pipeline on .fq.gz files only (the ones I got after quality check, adaptor trimming and removal of host sequences). I am getting the same error.

Yes, all the files are fastq files in gzip format. And all the files are giving this same error. How to know if the files are truncated. The de novo assembled files of these fastq are just fine.

Everything in the downstream analysis is just fine. I don't know why it is giving this error.

xinehc commented 6 months ago

try seqkit fq2fa to convert your file into fa format, then run args_oap on the converted files. If something is wrong with your file seqkit should raise errors.

chanchalrana commented 6 months ago

image image image image What might be the issue?? I cannot figure out.

In metadata.txt, if only nread column is important, we can make it manually. Right? Beacuse the extracted.fa file is made in stage 1 and it has the reads (It is not empty) so, I think it should. Isn't it?

image

Also, I ran one file with older version of args_oap on my desktop and it worked but when I ran the same file with the new version, it gave the same error.

image

image

xinehc commented 6 months ago

What might be the issue?? I cannot figure out.

The file after seqkit fq2fa is not gzipped, you need to remove .gz otherwise it will be mistakenly considered as a gzipped file.

In metadata.txt, if only nread column is important, we can make it manually. Right?

Please try not to manually editing the metadata file. It may lead to unexpected results.

Also, I ran one file with older version of args_oap on my desktop and it worked but when I ran the same file with the new version, it gave the same error.

I am actually not sure what is wrong with your input file. If you don't mind please attach a minimal reproducible example file so that I could check.

chanchalrana commented 6 months ago

subset_HeP-146-19.fq.gz

Is it sufficient. Please let me know.

Also, Why is that, same file when run on older version of ARGS-OAP (on Desktop) is giving result but when run on recent version (On server) is not giving the result.

xinehc commented 6 months ago

If you remove .gz then it should work:

wget https://github.com/xinehc/args_oap/files/15271838/subset_HeP-146-19.fq.gz

mkdir -p input
mv subset_HeP-146-19.fq.gz input/subset_HeP-146-19.fq

args_oap stage_one -i input -o output -f fq
args_oap stage_two -i output

Your files are not gziped. You can simply check whether a file is gzipped using gzip -t.

bioinfogini commented 4 months ago

Hello @xinehc , sorry to jump in but I am facing a similar issue, although I have paired ends files. So, what I have are Trimmed paired compressed fastq files. As they are compressed in zip, I decompressed using pigz. Then, I tried to 1) convert them with seqkit fq2fa 2) deinterleave them with reformat.sh from bbmap 3) run args_oap stage_one -i input2 -o output2 -f fa --> returned me WARNING: Something is wrong with <input2/R1.fa>, skip.

Also, I tried 1) deinterleave them with reformat.sh from bbmap 2) check them with seqkit sana 3) run args_oap stage_one -i input -o output -f fq --> returned me WARNING: Something is wrong with <input2/R1.fa>, skip

I can't really understand what is wrong. I am working with args_oap as conda env on a server, where I don't have timewalls. Can you please help? Thanks

I attach a file of mine as proxy. Thanks in advance

https://drive.google.com/file/d/1y-20f7rUJX2cm3rYBIdmxPtdI-iq6qec/view?usp=drive_link

chanchalrana commented 4 months ago

My issue was resolved. Firstly, there is no need to convert them to .fa as ARG_OAP can handle fastq gz files too. Just replace the .fa with .fq in the command line. Second, I was operating on supercomputer and the I was running the args_oap in the command shell. Running this in the command shell was the mistake I was doing. Instead, we have to submit the jobs on supercomputer using sbatch or other predefined commands depending upon the server.

I tried everything but actually this was the error. There was nothing wrong with the sequences.

Also, the same command runs on the local system with the same file that was giving the error on the server.

bioinfogini commented 4 months ago

@chanchalrana thank a lot for your feedback, I will try to adopt same procedure! My problem is only that I also have to deinterleave my files, as they are paired ends while your were not, if I understood correctly. Keep you posted!

UPDATE: I was not able to run the args_oap on the server, but worked on my laptop. Thanks @chanchalrana for the feedback.