xiezhq / ISEScan

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
Apache License 2.0
79 stars 17 forks source link

Merge the sum files for condensed overview and heatmaps #53

Closed AhmedElsherbini closed 10 months ago

AhmedElsherbini commented 10 months ago

Hello, firstly thanks for your nice tool. I like it :)

For me, as I work with many genomes per run, I needed a simple feature, which is condensing the sum files into one simple Excel sheet, one overall and one for heatmaps.

So,

I developed this simple script. In two flavors. one that includes the copy number (nIS) for the whole analysis and one script that ignores it (meaning it uses only the first column of the sum file). I see in the sum file you sum the copy number as the total number of IS per genomes, does this mean that I shall use (nIS) for heatmaps and exclude the first script? what is your opinion?

Your feedback is welcome.

xiezhq commented 10 months ago

Hi AhmedElsherbini,

I am not sure what you want to know from me. For each ISEScan run, one *.sum file is created per one FASTA file (input file, e.g. NC_012624.fna). If there are multiple sequences in the input file (FASTA file), the column 'nIS' in the final row in .sum file gives the total number of IS element copies identified in all sequences in the input file.

Hope this answered your question.

Xie

AhmedElsherbini commented 10 months ago

Okay, thank you for your fast response, so I will keep only the script that includes the (nIS) in the heatmap.