ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
101 stars 13 forks source link

--split_fasta #19

Closed tillenglert closed 1 year ago

tillenglert commented 1 year ago

Hi,

I just wanted to ask if there is a documentation on the parameter split_fasta or if you can maybe tell, what exactly it does?

Thank you in advance,

Till

pstrope commented 1 year ago
$ singularity exec fcs-gx.0.2.3.sif gx  --help
SYNOPSIS
    gx split-fasta [-i <input=stdin>] [-o <output=stdout>]

OPTIONS
    split-fasta     command: Split fasta from stdin on N-runs of length at least 10. --input is fasta.

$ singularity exec  --bind $PWD:/host fcs-gx.0.2.3.sif  gx split-fasta --input=/host/test.fa
tillenglert commented 1 year ago

Hi thank you for your answer,

so this option splits the fasta file into N-Chunks for internal analysis or did I get this wrong? Will it change the reproducibility for larger analysis? I couldn't see a difference for the test-only dataset.

pstrope commented 1 year ago

Hi, It splits the input fasta wherever there are 10 or more N's. If you look at the taxonomy report, it shows how split sequences are given seq-id's based on their split ranges. It will not change the reproducibility for the same version of the software.

There was a bug, that this parameter was set to default:True in this version, that's why you didn't see the difference. In our next release we will have the option to set this on and off.

Thank you for your feedback. It really helps us.

tillenglert commented 1 year ago

Hi,

thanks for clearing that up, I was confused by the N-runs!