Open barbaracania opened 2 years ago
Hi @barbaracania, thanks for reaching out. There are couple of things going on here, so I'll try to address them in chronological order:
NOTE_1
. Have you merged contigs from different assemblies, potentially from different strains/species? If this is the case, then this could cause severe issues for Prodigal's gene prediction which in turn would cause issues to detect Platon's marker protein sequences (MPS).--characterize
leads to a full characterization of all contigs and therefore deactivates any filtering. Hence, this option can be used to gain information on any contig, no matter whether its chromosome or plasmid borne.--characterize
mode, Platon doesn't classify contigs but characterizes all of themCould you provide some information on your data: Metagenome or isolate? Merged assemblies? Best regards!
Thank you for your answer. My data is metagenomic, but the samples were treated with a plasmid-safe DNAse, so it should contain mostly plasmid reads. I ran SPAdes on it with the --metaplasmid option, and afterwards I only modified the names of contigs by removing all the information after the coverage, as otherwise Platon was not able to read the coverage correctly from them. Without the modification, the names look like this: >NODE_1_length_63294_cov_26.832935_cutoff_20_type_circular. The data was not modified in any other way. As it is suggested that the contigs produced by metaplasmidSPAdes should still be confirmed as plasmids by additional means, I thought of including Platon in my pipeline for this purpose.
Just to make this clear, I understand that using the --characterize option for Platon gives only info about contigs. I used it only to get an idea about my data and also to show it to you. When I was testing the three different modes, I was not using this option. For example, when I used
platon contigs.fasta --db ~/Databases/db --output platon_accu --mode accuracy --threads 8
my contigs.tsv file starts like this:
ID Length Coverage # ORFs RDS Circular Inc Type(s) # Replication # Mobilization # OriT # Conjugation # AMRs # rRNAs # Plasmid Hits NODE_1_length_63165_cov_26.834275 63165 26.8 48 0.0 yes 0 0 0 0 0 0 0 0 NODE_1_length_51546_cov_2.360878 51546 2.4 74 0.0 yes 0 0 0 0 0 0 0 0 NODE_2_length_32011_cov_1.484036 32011 1.5 39 0.0 yes 0 0 0 0 0 0 0 0 NODE_3_length_19747_cov_141.934964 19747 141.9 3 0.0 yes 0 0 0 0 0 0 2 0
My contigs.chromosome.fasta contains only the first two contigs from my previous post that were not identified by Platon as circular, and the contigs.plasmid.fasta has everything else, including the contig on which the rRNA genes were found. When I try the sensitivity mode, I get the same results as with the accuracy mode, but the specificity mode gives me empty contigs.tsv and contigs.plasmid.fasta files, while all the contigs are found in the contigs.chromosome.fasta. From what I understood, the accuracy mode should take all the contig characteristics into consideration when making a choice if a contig comes from a plasmid or a chromosome, while the other two modes are relying only on the RDS values. Since all my RDS values are 0.0, I am confused why I am getting the above-described results...
Hi,
could you repeat your analysis by using the --meta
option? This is currently not yet available in the latest official release v1.6 but available in the main
branch. You can install it into your environment via:
git clone https://github.com/oschwengers/platon.git python -m pip install --no-deps --ignore-installed platon/
Without further information I cannot figure out what is causing this behaviour, but Prodigal will certainly not work perfectly without the meta
option set as it thinks it's a single genome.
Another reason could be that Platon simply cannot detect any marker proteins within your metagenome contigs. In order to do so, I'd need the <prefix>.json
.
Hi,
Thank you very much for trying to help me with my issue! I tried the --meta
option, but the results seem to be all the same. Here is the .json file produced with the command platon contigs.fasta --db ~/Databases/db --output platon_accu_meta --meta --mode accuracy --threads 8
contigs.json.zip
Hi, indeed there is not a single marker protein that could be detected on your contigs, which is odd/interesting and hasn't occured so far - at least not for an entire dataset. However, we do not have much experience with metagenome data so far.
So in principle, there are 2 different reasons that I can think of:
contigs.log
file?Good morning, Sure! Here is the .log file from the same run: contigs.log
I took a look at the logs and from a technical perspective, everything is just fine. However, there is indeed not a single blast (diamond) hit against the marker protein database which so far has not occured (at least not that I knew of). This is very interesting and helpful to know in terms of metagenome analysis with platon!
As mentioned above, I'm currently computing and compiling a database update which could help here - of course this would require further investigations. As of today, it seems to be the case that Platon is not the right tool for your dataset. May I refere you to PlasFlow? Since Platon was initially developed with single isolates in mind, PlasFlow might provide better results since it's solely addressing metagenome data.
I'll leave this open until we've released the new database version and Platon [v1.7] just to let you know. Again, thanks for trying Platon and reporting this! Best regards!
Hi! I am trying to use Platon 1.6 installed with BioConda to identify plasmid contigs. By running the following command:
platon contigs.fasta --db ~/Databases/db --output platon_accu --mode accuracy --threads 8 --characterize
I got the following result (I am showing the first few lines):
ID Length Coverage # ORFs RDS Circular Inc Type(s) # Replication # Mobilization # OriT # Conjugation # AMRs # rRNAs # Plasmid Hits NODE_1_length_66028_cov_26.537579 66028 26.5 50 0.0 no 0 0 0 0 0 0 0 0 NODE_1_length_63294_cov_26.832935 63294 26.8 48 0.0 no 0 0 0 0 0 0 0 0 NODE_1_length_63165_cov_26.834275 63165 26.8 48 0.0 yes 0 0 0 0 0 0 0 0 NODE_1_length_51546_cov_2.360878 51546 2.4 74 0.0 yes 0 0 0 0 0 0 0 0 NODE_2_length_32011_cov_1.484036 32011 1.5 39 0.0 yes 0 0 0 0 0 0 0 0 NODE_3_length_19747_cov_141.934964 19747 141.9 3 0.0 yes 0 0 0 0 0 0 2 0
After running the same command without "--characterize", the first two contigs are classified as chromosomal and the rest as plasmids. Now, I am not sure if it is a bug or if I am misunderstanding how the calculation of RDS or the classification criteria work, but the RDS value for all my contigs (over a thousand of them) is always 0.0. Moreover, it looks like rRNA genes were detected in the last showed contig and the number of ORFs was very low, but it was still characterized as a plasmid. Lastly, when I tried to use the sensitivity mode, I got the same results as with the accuracy mode, but when using the specificity mode, all my contigs were classified as chromosomes. Is this an expected behavior?