simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

Multiple errors given by the new VirSorter script #52

Closed genomics-pixel closed 4 years ago

genomics-pixel commented 4 years ago

Dear Simon,

Thank you for your generous help with VirSorter earlier.

I am sorry to bring up another issue, but when I have tried the new VirSorter script (i.e. the one made available after issue#48) on my larger set of data, it gave me multiple errors. virsorter_err_log_for_simon.txt (The above file contains all of the errors that were given in different VirSorter analyses)

From the list of errors, it seems that there are mainly two types of issue.

  1. Failure of "cat" command to find "r_2/Contigs_prots_vs_New_unclustered.tab"
  2. Problem with usage of uninitialized value

If you could please take time to look at this that would fantastic.

Sincerely, genomics-pixel

simroux commented 4 years ago

Hi, Thanks for reporting these. The good news is that none of these are real errors, but these are warnings. The "uninitialized value" ones should have been fixed in the last release (Sep 16). The "Failure of "cat" command to find "r_2/Contigs_prots_vs_New_unclustered.tab"" is most likely a warning as well because it only occurs in "r_2", not in the previous rounds. I just pushed a new version where this case is anticipated and should not print the warning anymore.

Let me know if you see anything else ! Best, Simon

genomics-pixel commented 4 years ago

Dear Simon,

Great to know that these messages are just warnings and not errors! Also many thanks for putting up a new version of VirSorter!!

I will have the newest version of your VirSorter installed in our computer cluster and re-run the analysis!

Again thanks for your fantastic help.

Sincerely, genomics-pixel

genomics-pixel commented 4 years ago

Dear Simon,

I tried the new VirSorter script and found that there were still some errors/warnings. virsorter_err_log_20191003.txt ~List_of_errors_2019_10_03.txt~ (Sorry the file I initially uploaded was somehow messed up. Please see "virsorter_err_log_20191003.txt")

The errors/warnings seems to be related with "Use of uninitialized value".

If you could tell me whether this affects the results or not that would be great.

Many thanks in advance

Sincerely, genomics-pixel

simroux commented 4 years ago

Hi, So these are all warnings, and won't impact the results at all. Which means, hopefully, you should be all good from now on :-)

genomics-pixel commented 4 years ago

Dear Simon,

It's great to know that these messages are just warnings and should not affect the results!

Currently, I am looking through VirSorter results and noticed that there are still some overlapping viral regions. (VirSorter was run with version eadcee5 (i.e. the newest version )of wrapper_phage_contigs_sorter_iPlant.pl)

However, these overlapping viral regions are unlike the ones I have reported in previous issues as they do not have identical start or end positions.

Ex.)

>VIRSorter_k251_76503_flag_1_multi_82_0000_len_18387SAMPLE10gene_6_gene_17-4753-17933-cat_4

  • Start coordinate 4,753 bp
  • End coordinate 17,933 bp

>VIRSorter_k251_76503_flag_1_multi_82_0000_len_18387SAMPLE10gene_5_gene_15-3787-14472-cat_5

  • Start coordinate 3,787 bp
  • End coordinate 14,472 bp

If I understand your paper correctly, VirSorter uses sliding windows to identify regions enriched with viral signatures.

Therefor, my guess is that above such cases are intended to be detected by VirSorter and thus should not be a problem, but I am not certain.

I am sorry to ask you so many questions, but if you could please enlighten me on this matter I would appreciate it very much

Thank you in advance.

Sincerely, genomics-pixel

P.S.

I've looked through the other files in the result directory and now I am less sure if the above situation is intended behaviour or a bug.

I have looked at the Metric_files/VIRSorter_phage_signal.tab and found that only VIRSorter_k251_76503_flag_1_multi_82_0000_len_18387SAMPLE10gene_5_gene_15-3787-14472-cat_5 was present in this file, while both of the overlapping viral regions were listed in VIRSorter_global_phage_signal.tab .

This situation is quite similar to the bug I have described earlier in https://github.com/simroux/VirSorter/issues/48#issuecomment-527840385.

If you could please take a look at these files I would appreciate it.

simroux commented 4 years ago

Hi,

Yes, it is a known VirSorter limitation that these kind of overlapping prophage predictions exist. So it is "working as intended", in the sense that these are problematic situations that VirSorter can't automatically solve by itself.

Best, Simon

genomics-pixel commented 4 years ago

Dear Simon,

Many thanks for your helpful reply! I now understand that in some rare situations there will be overlapping viral region predicted.

Would it make sense to take such cases where there are overlaps in predicted viral region as it is and regard both overlapping regions as possible prophage regions?

Sincerely, genomics-pixel

simroux commented 4 years ago

In my opinion yes, it shouldn't impact things too much. Another option is to only select one of the two when prophages are overlapping (by manually looking at the sequence annotation).

genomics-pixel commented 4 years ago

Dear Simon,

Thank you for your quick reply and teaching me how I can handle situations where predicted viral regions overlap each other. Since both ways of handling overlaps are fine, I think I will go with regarding both overlapping regions as possible prophage regions.

VirSorter is a really fantastic virus prediction tool and I greatly appreciate your generous help in using it. Again many thanks!

Sincerely, genomics-pixel