wilkelab / cinful

A fully automated pipeline to identify microcins along with their associated immunity proteins and export machinery
GNU General Public License v3.0
6 stars 1 forks source link

cinful_id instead of Unnamed: 0 #42

Closed tijeco closed 3 years ago

tijeco commented 3 years ago

This will encompass quite a few things/ issues. Basically, I have been writing dataframes to csv when that uses the sample ID in the index column. In the merging process something weird happened that I'll have to diagnose. The sample id has the full path in some of them and not in others, which causes the file to be twice as big as it needs to be. The full path doesn't need to be in the cinful_id, it just needs the sample|contig|start|stop|strand. This will eventuall be used to make fasta files in sample/ with contig|start|stop|strand in the header (related to #37 )

tijeco commented 3 years ago

Well, I figured out why some ids were weird. It was because of unknown .fna files in nested directories, so they had strange file paths in the id. That being said.. perhaps the dirsctory structure needs to be maintained... I'm not really sure..