jorvis / biocode

Bioinformatics code libraries and scripts
MIT License
504 stars 247 forks source link

rename fasta headers with regex #77

Open CSynodinos opened 2 years ago

CSynodinos commented 2 years ago

I am proposing a script for renaming fasta headers with regex. Both header id's and descriptions can be changed simultaneously and individually.

jorvis commented 2 years ago

Not sure how I missed this! OK, I see documented header pattern CSV file, but could you give an example of using this? Some example input headers, a csv file, and exported headers? Was trying to see from the documentation how you'd handle the possibility of duplicates in the output.

CSynodinos commented 2 years ago

Hello Jorvis, sorry for the late response. I have attached some example files and the output. The input fasta is the Covid-19 Wuhan variant copy pasted a bunch of times but with each header altered apart from the first one.

The command I used was: python3 header_renamer.py -i header_test_file.fasta -cv patterns.csv -cnt true

When it comes to the duplicates, my solution was to use the -cnt argument which basically adds a counter to each header iteration. It is not the best solution, but it does allow for making each header different regardless of whether you have duplicates or not. Its default value is False.

example_files.zip