mdshw5 / pyfaidx

Efficient pythonic random access to fasta subsequences
https://pypi.python.org/pypi/pyfaidx
Other
459 stars 75 forks source link

Allow supplying no .bed file and creating just the index #203

Closed wbazant closed 1 year ago

wbazant commented 1 year ago

I am looking for a handy way of creating index files when given .fastas. My workflow will do something with them later, so I only need to index them. I dug around and found this project - I tried bedtools before, but it made me create a .bed file before I could use it.

This might not be in scope of how you want to develop the package, but what's the best way to only index the file? faidx $FASTA > /dev/null wastes I/O, and faidx $FASTA "" seems to almost do what I want, except it produces a warning.

For now I have:

indexFasta(){
  FASTA="$1"
  ABS_DIR=$( cd $(dirname $FASTA) && pwd )
  singularity run  --bind $ABS_DIR:/work \
    https://depot.galaxyproject.org/singularity/pyfaidx%3A0.7.1--pyh5e36f6f_0 \
    faidx /work/$(basename $FASTA) "" 2>&1 | grep -v "warning:  not found in file"
}
wbazant commented 1 year ago

I think I've found it!

faidx --invert-match /work/ESTs.fa

matches nothing, which is what I would like.

mdshw5 commented 1 year ago

@wbazant I'm glad you found the solution, and agree that this is not intuitive behavior. Unfortunately at this point I worry that someone, somewhere is relying on the existing behavior :). I've moved the example in the docs to the top of the usage block since I think this is a common use case.