xiezhq / ISEScan

A python pipeline to identify IS (Insertion Sequence) elements in genome and metagenome
Apache License 2.0
79 stars 17 forks source link

Pangenome file such as inputfile #41

Closed Leytoncito closed 2 years ago

Leytoncito commented 2 years ago

I was wondering ... if it is advisable to use a pangenome file instead of genomes. Technically it works, but: does it make biological sense to do this?.

Thanks in advance.

xiezhq commented 2 years ago

It depends. ISEScan is an automated tool to independently detect and annotate IS element and transposases for each of input FASTA sequences. It does not care what kinds of genomes those input sequences are. It is the users of ISEScan to decide whether it make biological sense to do so. If there multiple sequences in the input FASTA file, ISEScan will pick up the first sequence from those sequences and then scan that sequence and detect IS elements in it. After finishing detection of IS elements in the first sequence, it will pick up the second sequence and then scan the second sequence and detect IS elements in it, and so on.

Leytoncito commented 2 years ago

Thaks for the reply. In the sense, the file pangenoma.fa, is a fasta file with annotated sequences, including IS. The annotation is not specialized in IS elements. My idea was to corroborate which of these sequences are indeed IS.

xiezhq commented 2 years ago

I didn't understand what you mean. Is the pangenoma.fa the input or output of ISEScan? If it is the input file and ISEScan succeeded in running on it, you can get find the predicted IS elements in those outupt files of ISEScan.

Leytoncito commented 2 years ago

excuse my English. The pangenome.fa is a result of a pangenomic analysis, it is a multifasta file that gathers all the previously annotated sequences. Since the pangenome is the total grouping of genes from a set of studied genomes, I thought that by using it (pangenome.fa) as inputfile to ISEScan... I can avoid running the ISEScan genome by genome.

xiezhq commented 2 years ago

Correct. You can put multiple sequences in one fasta file, then run ISEScan on this fastq file. You will finally find the IS elements predicted for each sequence.

You might like to know one thing. The current version of ISEScan use FragGeneScan to predict genes (therefore transposases) in the input sequence, the predicted genes might be different from the gene annotations from other gene prediction tools in some cases.

xiezhq commented 2 years ago

If your focus is to process lots of sequences, you can also take a look the tips on how to run ISEScan pareallely on your Linux computer, https://github.com/xiezhq/ISEScan#lots-of-genomes.

Leytoncito commented 2 years ago

Thanks for the answers, now I am clearer.