wilkelab / cinful

A fully automated pipeline to identify microcins along with their associated immunity proteins and export machinery
GNU General Public License v3.0
6 stars 1 forks source link

output fasta files #37

Closed tijeco closed 3 years ago

tijeco commented 3 years ago

I think this is better to be included in the python wrapper than the snakemake routine, since it generates an undisclosed number of files, and dynamic rules can be a bit of a pain in snakemake so I don't feel it worth it.

Basically, take the top hits from microcins csv file and put it in a file with the following structure "microcin.{basename(sample)}.pep" with header looking like ">contig|start:stop:strand"

The sequences can be sorted by bit score so that the users know which is the best hit. Also, if any of these are not included in the three signal matches up or downstream of CvaB, those can be put in a "signalMatch_nearCvaB.pep" or something.

I'm imagining this would require a bit of merging with the nr_csv file and the all_hits file as well as the signalMatch...