pachterlab / kma

Keep Me Around: Intron Retention Detection
GNU General Public License v2.0
29 stars 20 forks source link

Remove biopython dependency, use .fai indexing with pyfaidx #24

Open mdshw5 opened 5 years ago

mdshw5 commented 5 years ago

Thanks for sharing your analysis. I plan on using this in the new year, and noticed that running the pre-processing scripts on our shared HPC cluster resulting in some permissions issues for shared reference genome assemblies. The issue is that pyfasta creates its own index sidecar files, and fails if the user cannot write to the shared resource directory. I've swapped out pyfasta for my pyfaidx module, which will read or create a samtools .fai index file, which is likely already present in such a scenario. This also allowed me to drop the biopython dependency, as pyfaidx has a FASTA wrapping function.

The output on the example dataset is identical using the code in this PR vs the current master branch.