ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
88 stars 12 forks source link

[FEATURE REQUEST]: <title> #30

Closed accopeland closed 1 year ago

accopeland commented 1 year ago

Is this a feature request for FCS-adaptor FCS-adaptor

Describe the problem you'd like to be solved Is the vector / adaptor database available separately? If yes, a pointer in the docs would be welcome.

Describe the solution you'd like Just a mention in the docs about the location of the database

Describe alternatives you've considered Digging around in the container and associated scripts.

accopeland commented 1 year ago

I see that there is an /app/data/CommonContaminants directory in the container, still it would be useful to have some documentation about what's there, how it's curated, if it's available outside the container, etc.

etvedte commented 1 year ago

Hello,

We are working on a solution for this request. The BLAST db is indeed available in the /app/data/CommonContaminants as you mentioned above. You could presumably copy that file to your local filesystem. Are you requesting the FASTA underlying the db be made available? We do have some sequences available at the locations below, although we need to update the euks file to make it in sync with the FCS-adaptor db.

https://ftp.ncbi.nlm.nih.gov/pub/kitts/adaptors_for_screening_euks.fa https://ftp.ncbi.nlm.nih.gov/pub/kitts/adaptors_for_screening_proks.fa

The FCS-adaptor db sequences are derived in large part from the NCBI UniVec database. You can find information regarding the construction of UniVec at https://www.ncbi.nlm.nih.gov/tools/vecscreen/univec/. In the future we can update the FCS-adaptor wiki for relevant info about curation.