ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
88 stars 12 forks source link

run_fcsadaptor.sh is not trimming #22

Closed gunjanpandey closed 1 year ago

gunjanpandey commented 1 year ago

I ran the following command run_fcsadaptor.sh --fasta-input ${genome} --output-dir adaptor_out/ --image /apps/fcs-genome/0.2.2/dist/fcs-adaptor.sif --container-engine singularity --euk

and got the following result #accession length action range name contig_14529 37558 ACTION_TRIM 37184..37244 CONTAMINATION_SOURCE_TYPE_ADAPTOR:NGB01054.1:Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT26 contig_2502 251119 ACTION_TRIM 106358..106405 CONTAMINATION_SOURCE_TYPE_ADAPTOR:NGB01054.1:Rubicon Genomics ThruPLEX DNA-seq single-index iPCRtagT26

However, the input ${genome} and the output file adaptor_out/cleaned_sequences/${genome} are identical - no timming happened. Script finished successfully.

etvedte commented 1 year ago

Hello,

This is expected behavior. The TRIM calls are >100 bp from the ends of sequences, and therefore are not removed automatically. See https://github.com/ncbi/fcs/wiki/FCS-adaptor#rules-for-action-assignment.

The rationale for this behavior is due to the uncertainty of the proper corrective action. Adaptor sequences in the middle of contigs could be the result of false contig joins, in which case the best action would be to split the contig into two at the adaptor site. If it is not a false contig join, then one could simply mask that portion of the sequence. I suggest looking at the regions and the surrounding sequence more closely to determine the best action for your use case.

pstrope commented 1 year ago

Closing. Please follow-up if you have other questions.