oschwengers / platon

Identification & characterization of bacterial plasmid-borne contigs from short-read draft assemblies.
https://doi.org/10.1099/mgen.0.000398
GNU General Public License v3.0
111 stars 15 forks source link

RDS is always 0.0 #31

Open barbaracania opened 2 years ago

barbaracania commented 2 years ago

Hi! I am trying to use Platon 1.6 installed with BioConda to identify plasmid contigs. By running the following command:

platon contigs.fasta --db ~/Databases/db --output platon_accu --mode accuracy --threads 8 --characterize

I got the following result (I am showing the first few lines):

ID Length Coverage # ORFs RDS Circular Inc Type(s) # Replication # Mobilization # OriT # Conjugation # AMRs # rRNAs # Plasmid Hits NODE_1_length_66028_cov_26.537579 66028 26.5 50 0.0 no 0 0 0 0 0 0 0 0 NODE_1_length_63294_cov_26.832935 63294 26.8 48 0.0 no 0 0 0 0 0 0 0 0 NODE_1_length_63165_cov_26.834275 63165 26.8 48 0.0 yes 0 0 0 0 0 0 0 0 NODE_1_length_51546_cov_2.360878 51546 2.4 74 0.0 yes 0 0 0 0 0 0 0 0 NODE_2_length_32011_cov_1.484036 32011 1.5 39 0.0 yes 0 0 0 0 0 0 0 0 NODE_3_length_19747_cov_141.934964 19747 141.9 3 0.0 yes 0 0 0 0 0 0 2 0

After running the same command without "--characterize", the first two contigs are classified as chromosomal and the rest as plasmids. Now, I am not sure if it is a bug or if I am misunderstanding how the calculation of RDS or the classification criteria work, but the RDS value for all my contigs (over a thousand of them) is always 0.0. Moreover, it looks like rRNA genes were detected in the last showed contig and the number of ORFs was very low, but it was still characterized as a plasmid. Lastly, when I tried to use the sensitivity mode, I got the same results as with the accuracy mode, but when using the specificity mode, all my contigs were classified as chromosomes. Is this an expected behavior?

oschwengers commented 2 years ago

Hi @barbaracania, thanks for reaching out. There are couple of things going on here, so I'll try to address them in chronological order:

  1. For some reason, the first 4 contigs are denoted as NOTE_1. Have you merged contigs from different assemblies, potentially from different strains/species? If this is the case, then this could cause severe issues for Prodigal's gene prediction which in turn would cause issues to detect Platon's marker protein sequences (MPS).
  2. --characterize leads to a full characterization of all contigs and therefore deactivates any filtering. Hence, this option can be used to gain information on any contig, no matter whether its chromosome or plasmid borne.
  3. The last contig indeed has 2 rRNAs detected, however in --characterize mode, Platon doesn't classify contigs but characterizes all of them
  4. It depends on the data, sometimes sensitivity and accuracy mode provide the same results. Also, in specificity mode Platon uses very strict classification rules for the RDS and since it is below the specificity threshold, it refuses to classify any of your contigs as plasmid. So yes, this is expected.

Could you provide some information on your data: Metagenome or isolate? Merged assemblies? Best regards!

barbaracania commented 2 years ago

Thank you for your answer. My data is metagenomic, but the samples were treated with a plasmid-safe DNAse, so it should contain mostly plasmid reads. I ran SPAdes on it with the --metaplasmid option, and afterwards I only modified the names of contigs by removing all the information after the coverage, as otherwise Platon was not able to read the coverage correctly from them. Without the modification, the names look like this: >NODE_1_length_63294_cov_26.832935_cutoff_20_type_circular. The data was not modified in any other way. As it is suggested that the contigs produced by metaplasmidSPAdes should still be confirmed as plasmids by additional means, I thought of including Platon in my pipeline for this purpose.

Just to make this clear, I understand that using the --characterize option for Platon gives only info about contigs. I used it only to get an idea about my data and also to show it to you. When I was testing the three different modes, I was not using this option. For example, when I used

platon contigs.fasta --db ~/Databases/db --output platon_accu --mode accuracy --threads 8

my contigs.tsv file starts like this:

ID Length Coverage # ORFs RDS Circular Inc Type(s) # Replication # Mobilization # OriT # Conjugation # AMRs # rRNAs # Plasmid Hits NODE_1_length_63165_cov_26.834275 63165 26.8 48 0.0 yes 0 0 0 0 0 0 0 0 NODE_1_length_51546_cov_2.360878 51546 2.4 74 0.0 yes 0 0 0 0 0 0 0 0 NODE_2_length_32011_cov_1.484036 32011 1.5 39 0.0 yes 0 0 0 0 0 0 0 0 NODE_3_length_19747_cov_141.934964 19747 141.9 3 0.0 yes 0 0 0 0 0 0 2 0

My contigs.chromosome.fasta contains only the first two contigs from my previous post that were not identified by Platon as circular, and the contigs.plasmid.fasta has everything else, including the contig on which the rRNA genes were found. When I try the sensitivity mode, I get the same results as with the accuracy mode, but the specificity mode gives me empty contigs.tsv and contigs.plasmid.fasta files, while all the contigs are found in the contigs.chromosome.fasta. From what I understood, the accuracy mode should take all the contig characteristics into consideration when making a choice if a contig comes from a plasmid or a chromosome, while the other two modes are relying only on the RDS values. Since all my RDS values are 0.0, I am confused why I am getting the above-described results...

oschwengers commented 2 years ago

Hi, could you repeat your analysis by using the --meta option? This is currently not yet available in the latest official release v1.6 but available in the main branch. You can install it into your environment via: git clone https://github.com/oschwengers/platon.git python -m pip install --no-deps --ignore-installed platon/ Without further information I cannot figure out what is causing this behaviour, but Prodigal will certainly not work perfectly without the meta option set as it thinks it's a single genome. Another reason could be that Platon simply cannot detect any marker proteins within your metagenome contigs. In order to do so, I'd need the <prefix>.json.

barbaracania commented 2 years ago

Hi, Thank you very much for trying to help me with my issue! I tried the --meta option, but the results seem to be all the same. Here is the .json file produced with the command platon contigs.fasta --db ~/Databases/db --output platon_accu_meta --meta --mode accuracy --threads 8 contigs.json.zip

oschwengers commented 2 years ago

Hi, indeed there is not a single marker protein that could be detected on your contigs, which is odd/interesting and hasn't occured so far - at least not for an entire dataset. However, we do not have much experience with metagenome data so far.

So in principle, there are 2 different reasons that I can think of:

  1. Platon's marker protein sequences are actually not encoded on these contigs. In this case, Platon's database wouldn't cover the protein space encoded in your data. We're currently compiling an updated DB which could help here.
  2. There could be an error occuring. In order to check that may I ask you to also provide the contigs.log file?
barbaracania commented 2 years ago

Good morning, Sure! Here is the .log file from the same run: contigs.log

oschwengers commented 2 years ago

I took a look at the logs and from a technical perspective, everything is just fine. However, there is indeed not a single blast (diamond) hit against the marker protein database which so far has not occured (at least not that I knew of). This is very interesting and helpful to know in terms of metagenome analysis with platon!

As mentioned above, I'm currently computing and compiling a database update which could help here - of course this would require further investigations. As of today, it seems to be the case that Platon is not the right tool for your dataset. May I refere you to PlasFlow? Since Platon was initially developed with single isolates in mind, PlasFlow might provide better results since it's solely addressing metagenome data.

I'll leave this open until we've released the new database version and Platon [v1.7] just to let you know. Again, thanks for trying Platon and reporting this! Best regards!