Closed carlos-astudillo closed 4 years ago
Carlos, the failure of this step indicates to foreign contamination. There should be *.tab output that lists regions.
If you wish PGAP to run regardless of the suspected contamination run pgap.py with the flag --ignore-all-errors
.
To see where in the genome the vector/adaptor contamination calls are you need to run pgap.py with -d
(debug mode). The coordinates of the suspect regions will be in the file debug/tmp-outdir/*/calls.tab.
Carlos, the failure of this step indicates to foreign contamination. There should be *.tab output that lists regions.
Thank you azat-badretdin. I was not able to find any calls.tab file, but as Thibaudnis said, it seems that I need to enable the debug mode. I will run with this flag enabled and check the .tab file. I will let you know the result.
Thanks again
If you wish PGAP to run regardless of the suspected contamination run pgap.py with the flag
--ignore-all-errors
. To see where in the genome the vector/adaptor contamination calls are you need to run pgap.py with-d
(debug mode). The coordinates of the suspect regions will be in the file debug/tmp-outdir/*/calls.tab.
Thank you Thibaudnis, I will try this. If there is any region in the calls.tab, what can I do? I will need to extract the DNA again and perform DNA sequencing again?
If there is any region in the calls.tab, what can I do?
After you confirm the calls, you could drop the regions from the assembly by replacing them with gaps (stretches of Ns), or you can try to trim and filter the raw reads better and redo the assembly. That sort of depends on how many flagged regions were found.
If you wish PGAP to run regardless of the suspected contamination run pgap.py with the flag
--ignore-all-errors
. To see where in the genome the vector/adaptor contamination calls are you need to run pgap.py with-d
(debug mode). The coordinates of the suspect regions will be in the file debug/tmp-outdir/*/calls.tab.Thank you Thibaudnis, I will try this. If there is any region in the calls.tab, what can I do? I will need to extract the DNA again and perform DNA sequencing again?
I performed the following step to identify the contamination regions
Once the contaminant was detected I used the script at the link below to remove the contaminant https://github.com/htafer/remoVecSec/blob/master/README.md
python3 removeContaminant.py --genomefile scaffolds_5.fasta -c contam_in_prok.fa > scaffolds_5.corr.fasta
After that, the generated draft sequence scaffolds_5.fasta.corr passed the screen_evaluate :D. Actually, the script removed a complete scaffold of almost the same length as the phiX174 sequence length.
Thanks to azat-badretdin and Thibaudnis for the help.
PD: Just for curiosity, someone knows the virus phiX174 and its effect on the bacteria being studied? It seems that it is a virus that may be present in the DNA of bacteria.
Glad this worked. We screen against phiX174 because it is a spike-in control used by Illumina. There is small chance that the phiX174 sequence is naturally occurring in the genome you sequenced, but chances are a lot higher that it is from the spike-in.
Based on your comment I found the paper below explained what you said. "Large-scale contamination of microbial isolate genomes by Illumina PhiX control" https://environmentalmicrobiome.biomedcentral.com/articles/10.1186/1944-3277-10-18
Thank you again.
I had a problem while running PGAP and I was not able to find the source of this error.
The following is the part of the cwltool.log
[2020-02-19 13:56:21] INFO [workflow FSCR_Calls_first_pass] completed success [2020-02-19 13:56:21] INFO [step FSCR_Calls_first_pass] completed success [2020-02-19 13:56:21] INFO [workflow vecscreen] starting step screen_evaluate [2020-02-19 13:56:21] INFO [step screen_evaluate] start [2020-02-19 13:56:21] INFO [job screen_evaluate] /tmp/2cv20i8v$ screen_evaluate \ -ifmt \ seq-annot \ -tab \ /tmp/tmpo9x7foja/stg617eb1d3-8aa2-411b-bcf6-dd05dc4bb8a5/calls.tab [2020-02-19 13:56:21] WARNING [job screen_evaluate] completed permanentFail [2020-02-19 13:56:21] WARNING [step screen_evaluate] completed permanentFail [2020-02-19 13:56:21] INFO [workflow vecscreen] completed permanentFail [2020-02-19 13:56:21] WARNING [step vecscreen] completed permanentFail [2020-02-19 13:56:21] INFO [workflow ] completed permanentFail { "gbk": null, "gff": null, "input_fasta": { "class": "File", "location": "file:///pgap/output/scaffolds_5.fasta", "size": 4412918, "basename": "scaffolds_5.fasta", "checksum": "sha1$819756e73ce6be95f9e779a4ba314362f7c4d51b", "path": "/pgap/output/scaffolds_5.fasta" }, "input_submol": { "class": "File", "location": "file:///pgap/output/submol.yaml", "size": 1676, "basename": "submol.yaml", "checksum": "sha1$9bcd9d13fdfcbe339a9da3d8ea0eb88cd307200d", "path": "/pgap/output/submol.yaml" }, "nucleotide_fasta": null, "protein_fasta": null, "sqn": null }
Thank you for any help you can give me