tormes_report.Rmd issue

geomili commented 3 years ago

Hi and many thanks for this extremely functional pipeline.

I've been using Tormes for a while now but after uninstalling and reinstalling it, I keep getting an ".html report file could not be created error" which is caused by a strange error regarding the tormes_report.Rmd file shown below:

The code in the tormes_report.Rmd file corresponding to lines 1799-1812 is the following:

Line 11662 is the following:

Any suggestions on how to fix this would be very much appreciated.

Kind regads,

Georgios

nmquijada commented 3 years ago

Hi Georgios, Thanks for your words and the screenshots! The first error relies in the "B19395_resfinder.tab" file. Can you open it to see how it looks like? It will be in antibiotic_resistance_genes/resfinder/ and also inside the compressed report_files.tgz file.

For the second error, I forced tormes to generate the report with UTF-8 encoding, but maybe the warning is appearing because of conflicts with your language environmental variables. You can check such variables by typing locale in your terminal.

In any case, the UTF-8 warning will not make the report to fail, so let's first see what happen to the file that is causing the main conflict. If you are able to run the report and the UTF-8 encoding is still giving issues, it can lead to some extrange characters appearing in the report. If that happens, we can solve in several ways, but let's make the report to run properly first.

Could you please inspect the "B19395_resfinder.tab" file and let me know?

Thanks! Narciso

geomili commented 3 years ago

Hi Narciso,

Thank you for your prompt reply and for your advice.

The B19395_resfinder.tab file located in antibiotic_resistance_genes/resfinder/ is empty; meaning that it does not contain any identified ARGs.

Surprisingly the B19395_resfinder.tab file is missing from report_files.tgz; as well as all .tab files for this strain (card, argannot etc.)

Regarding the local environmental variables we are also using UTF-8 encoding.

Many thanks,

Georgios

nmquijada commented 3 years ago

Hi Georgios,

Even if the B19535 does not have any AMRGs, the file mustn't be empy, as it would contain at least the header. I can't see why it is missing the report_files directory, as all the AMRG files are copied there without any conditional statement.

Did the other analyses of the pipeline work for that sample? Did the pipeline run well for the other samples?

Narciso

geomili commented 3 years ago

Hi Narciso,

Thanks for your reply. Sorry, when I said that the file is empty I meant that it doesn't contain any genes, the header is there as it should. There are 114 samples in this tormes run and for all other samples the .tab files seem to be present in the report_files directory.

I had a look and the B19395 analysis seems to be present in other directories as it should (annotation, virulence_genes, etc.)

E.g. snapshot of contents of B19395_annotation dr:

Georgios

nmquijada commented 3 years ago

Hi Georgios,

Ah OK! Thanks for the clarification. At least it means that the analysis worked. Perhaps there was some issue when copying the files to the report_files directory, as all the other things seems to be OK.

Could you please do:

cp -f antibiotic_resistance_genes/*/B19395_*tab report_files/
cp -f virulence_genes/B19395_*tab report_files/
cd report_files/
./render_report.sh

This will generate a new report based on the information contained in the tormes_report.Rmd file and the report_files directory.

Let me know! Narciso

geomili commented 3 years ago

Thanks for the suggestions Narciso.

Having a closer inspection of the output files I think that I've come a cross a minor bug.

There were 20 genomes (out of the 114) for which the pipeline didn't produce any output. Their files were empty. They failed since the beginning of the assembly. Having a closer look, I noticed that the issue was in the metadata file. I'm preparing the metadata file on excel and I'm then saving it as a text (tab limited) file which I then import into our HPC cluster. Sometimes the conversion process from excel to .tab is introducing extra artificial tabs between the columns which cause tormes to give a warning and stop running, as it should. These tabs can easily be removed using a Unix text editor on Linux (e.g. Vi).

In this case, the conversion introduced an invisible extra space (not a tab) after some of the names under the "Samples" column. These didn't cause a "metadata file error" and the pipeline started as usual. But every file created carried this extra space with it. For example, if the name of the sample was B19525, the assembly file would be B19525 _assembly.tgz and so on. This extra space is causing the issue. Once this extra space is removed with the vi editor, everything works like a charm!

P.S. I'm still getting a few warning messages but nothing that stops the report being created.

I think this might be useful for other users of this pipeline that might be experiencing the same issue.

Thank you kindly for your time.

And many thanks for this great pipeline.

Georgios

nmquijada commented 3 years ago

Hi Georgios,

Thank you very much for the update! We are preparing the release of the next version of TORMES, so I will definitely include some more information in the documentation regarding the metadata file. Also, I will try to add a "warning" for samples' names that contain spaces.

Indeed, dealing with excel and linux can be a pain... It usually solves the incompatibilities issues that, if you prepare the metadata file in excel, you copy the cells first to a notepad, and then you save that notepad to a simple txt file (and use this one as the metadata file). Additionally, I would recommend that you gain some experience with the sed command(s) to modify documents without the need of using vi/vim or so on. For instance, if you would like to convert all the " " by "_" in a file.txt:

sed -i "s/ /_/g" file.txt
# -i = directly modify the file
# s="substitute"
# there are to fields delimited by "/", the first is the character(s) to look for and to replace by those in the second field.
# In this case " " by "_"
# g = globally (all the " " from the file will be replaced by "_"

Using sed can save you some (lot of) time, specially when dealing with big files.

So good that it worked! Regarding the UTF-8 thing, did it caused any errors in your report? You can check the codification of any file by using file and modify it by using iconv. I am curious why it pops up this warning in your system. Can you please paste the ouput of file -i tormes_report.Rmd ?

Cheers, Narciso

geomili commented 3 years ago

Hi Narciso,

Apologies for the delay in my response. That's great news; I'm very much looking forward to trying the new version of tormes once its available.

Thank you for all the recommendations. I'll certainly try copying the excel cells into the notepad next time; and will utilize sed for any modifications required.

Regarding the UTF-8 error, I didn't notice anything unusual in the report. Here is the output of file -i tormes_report.Rmd.

Best,

Georgios

nmquijada commented 3 years ago

Hi Georgios,

We just released the new version of TORMES (v.1.3.0)! Could you please update to this version? Older ones won't be maintained.
I will prepare a small tutorial for the preparation of the metadata file for the Wiki sometime soon.

I will close this issue now, but feel free to reopen!

Best, Narciso

nmquijada / tormes

tormes_report.Rmd issue #38