ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
88 stars 12 forks source link

What's the effect of "--allow-same-species" option in the "fcs gx screen genome"? #53

Closed maruiqi0710 closed 9 months ago

maruiqi0710 commented 9 months ago

I noticed that there is a --allow-same-species option in fcs gx screen genome. What is the effect of this option? Will it lead to stricter screening conditions (more sequences are assumed to be contaminants) or looser screening conditions (fewer sequences are assumed to be contaminants)?

etvedte commented 9 months ago

Hello,

The --allow-same-species parameter determines whether GX reports alignments to sequences in the GX database corresponding to organisms with the identical tax-id supplied by the user. It is turned on by default, and for most use cases should just be left alone.

If you set --allow-same-species=F, one potential effect is that more contaminants get reported, particularly in cases where there is poor taxonomic representation in the database close to the source genome. But the parameter can be useful in cases where you suspect that the database sequences themselves might be contaminated.

You can also achieve the same effect by using the environment variable GX_ALIGN_EXCLUDE_TAXA=tax-id , where tax-id is the taxonomic identifier corresponding to the source genome. (see: https://github.com/ncbi/fcs/wiki/FCS-GX#environment-variables)

etvedte commented 9 months ago

Closing. Please re-open if additional assistance is needed.