mdshw5 / pyfaidx

Efficient pythonic random access to fasta subsequences
https://pypi.python.org/pypi/pyfaidx
Other
449 stars 75 forks source link

--split-files #154

Closed nservant closed 2 years ago

nservant commented 4 years ago

Hi, The --split-files option seems to change the sequence name in some cases ... here is a simple example :

>>more test.fa 
>HPV16_144-1
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>HPV55-12
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
>>faidx --split-files test.fa
>>>>ls
HPV16_1441.fa  HPV5512.fa  test.fa  test.fa.fai

Is there a way to keep the original names ? Thanks

mdshw5 commented 4 years ago

Thanks for raising this issue. Indeed I was a bit too strict with filename sanitation: https://github.com/mdshw5/pyfaidx/blob/843f69c1acbee3838081d7200438bbe49a95a88e/pyfaidx/cli.py#L9

https://github.com/mdshw5/pyfaidx/blob/843f69c1acbee3838081d7200438bbe49a95a88e/pyfaidx/cli.py#L38-L39

I'll replace the existing logic with something more robust, such as this, from the Django project.

mdshw5 commented 4 years ago

@nservant If you care to test these changes you can install the current master branch using pip install -e git+https://github.com/mdshw5/pyfaidx.git#egg=pyfaidx.