vanmooylipidomics / LOBSTAHS

Git repository for the R package "LOBSTAHS" (Lipid and Oxylipin Biomarker Screening Through Adduct Hierarchy Sequences)
Other
8 stars 12 forks source link

Creating a custom database #17

Closed MarcNIOZ closed 7 years ago

MarcNIOZ commented 7 years ago

Hello all,

I tried to generate a lobstah database but I encountered a couple of errors.

When I try create a database with the default .csv file I got the following:

generateLOBdbase(polarity=c("positive","negative"), gen.csv = FALSE, component.defs = "/export/data/mbesseling/Documents/Orbitrap/LOBSTAHS_componentCompTable.csv", AIH.defs=NULL, acyl.ranges=NULL, oxy.ranges=NULL) Error in read.table(componentTableLoc, sep = ",", header = TRUE, row.names = 1) : duplicate 'row.names' are not allowed

When I tried to create a database with some additional compounds I got the following:

generateLOBdbase(polarity=c("positive","negative"), gen.csv = FALSE, component.defs = "/export/data/mbesseling/Documents/Orbitrap/LOBSTAHS_componentCompTable.csv", AIH.defs=NULL, acyl.ranges=NULL, oxy.ranges=NULL) Error in calcComponentMasses(componentTable.loc, use.default.componentTable) : Different number of chemical building blocks in the component composition matrix and in the onboard list of exact masses. Check your composition matrix carefully. Aborting...

Perhaps somebody can help me with this.

Thanks in advance,

Marc Besseling PhD student at the Netherlands Institute for Sea Research (NIOZ)

jamesrco commented 7 years ago

Hi @MarcNIOZ,

Let's start with the first (less complex) error you are receiving when attempting to recreate the default database. When running your code using a clean version of the component definitions file (e.g., which I just downloaded from https://github.com/vanmooylipidomics/LOBSTAHS/blob/master/inst/doc/csv/LOBSTAHS_componentCompTable.csv), I did not receive the same error.

However, I just took a look at the default version of the LOBSTAHS_componentCompTable.csv file you are using (which you kindly sent along by e-mail), and the problem appears to be ~ 130 empty lines of csv data at the end of your file. Here is a sample of what it looks like when I open the file in a text reader:

UQ10:10,59,90,0,0,4,0,0,0,0,0,0,0,0,0,0,ubiquinone,ubiquinone,DB_unique_species
UQ11:11,64,98,0,0,4,0,0,0,0,0,0,0,0,0,0,ubiquinone,ubiquinone,DB_unique_species
UQ12:12,69,106,0,0,4,0,0,0,0,0,0,0,0,0,0,ubiquinone,ubiquinone,DB_unique_species
UQ13:13,74,114,0,0,4,0,0,0,0,0,0,0,0,0,0,ubiquinone,ubiquinone,DB_unique_species
PDMS6,12,36,0,0,6,0,0,0,0,0,0,0,0,0,6,PDMS,PDMS,DB_unique_species
PDMS7,14,42,0,0,7,0,0,0,0,0,0,0,0,0,7,PDMS,PDMS,DB_unique_species
PDMS8,16,48,0,0,8,0,0,0,0,0,0,0,0,0,8,PDMS,PDMS,DB_unique_species
PDMS9,18,54,0,0,9,0,0,0,0,0,0,0,0,0,9,PDMS,PDMS,DB_unique_species
PDMS10,20,60,0,0,10,0,0,0,0,0,0,0,0,0,10,PDMS,PDMS,DB_unique_species
PDMS11,22,66,0,0,11,0,0,0,0,0,0,0,0,0,11,PDMS,PDMS,DB_unique_species
PDMS12,24,72,0,0,12,0,0,0,0,0,0,0,0,0,12,PDMS,PDMS,DB_unique_species
PDMS13,26,78,0,0,13,0,0,0,0,0,0,0,0,0,13,PDMS,PDMS,DB_unique_species
PDMS14,28,84,0,0,14,0,0,0,0,0,0,0,0,0,14,PDMS,PDMS,DB_unique_species
PDMS15,30,90,0,0,15,0,0,0,0,0,0,0,0,0,15,PDMS,PDMS,DB_unique_species
PDMS16,32,96,0,0,16,0,0,0,0,0,0,0,0,0,16,PDMS,PDMS,DB_unique_species
PDMS17,34,102,0,0,17,0,0,0,0,0,0,0,0,0,17,PDMS,PDMS,DB_unique_species
PDMS18,36,108,0,0,18,0,0,0,0,0,0,0,0,0,18,PDMS,PDMS,DB_unique_species
PDMS19,38,114,0,0,19,0,0,0,0,0,0,0,0,0,19,PDMS,PDMS,DB_unique_species
PDMS20,40,120,0,0,20,0,0,0,0,0,0,0,0,0,20,PDMS,PDMS,DB_unique_species
PDMS21,42,126,0,0,21,0,0,0,0,0,0,0,0,0,21,PDMS,PDMS,DB_unique_species
PDMS22,44,132,0,0,22,0,0,0,0,0,0,0,0,0,22,PDMS,PDMS,DB_unique_species
PDMS23,46,138,0,0,23,0,0,0,0,0,0,0,0,0,23,PDMS,PDMS,DB_unique_species
PDMS24,48,144,0,0,24,0,0,0,0,0,0,0,0,0,24,PDMS,PDMS,DB_unique_species
PDMS25,50,150,0,0,25,0,0,0,0,0,0,0,0,0,25,PDMS,PDMS,DB_unique_species
PDMS26,52,156,0,0,26,0,0,0,0,0,0,0,0,0,26,PDMS,PDMS,DB_unique_species
PDMS27,54,162,0,0,27,0,0,0,0,0,0,0,0,0,27,PDMS,PDMS,DB_unique_species
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,

Those "empty" lines continue for a bit. Perhaps these were somehow retained after you added new entries to the file and then deleted them? This can happen if you don't actually delete the "empty" lines from the file. What were you using to edit your .csv files? Microsoft Excel is terrible about this. Take a look and see if you can fix the problem. Will take a look at the second issue here soon.

jamesrco commented 7 years ago

I am also opening an issue #18 for a new feature request with the @vanmooylipidomics/lobstahs-devel-team to see if we can't write something to make LOBSTAHS smart enough to correct/ignore that sort of problem in the future.

MarcNIOZ commented 7 years ago

Hello Jamie,

Thanks for all the trouble. I used excel for this and it is possible that I tried it with the added compounds and then removed them for the default option. This probably kept some of the lines “open”. I will try to see if this also the case with the added compounds error.

MarcNIOZ commented 7 years ago

Oke, I downloaded the original database file to try again. However now I got the same error as with the added compounds file.

generateLOBdbase(polarity=c("positive","negative"), gen.csv = FALSE, component.defs = "LOBSTAHS_componentCompTable.csv", AIH.defs=NULL, acyl.ranges=NULL, oxy.ranges=NULL) Error in calcComponentMasses(componentTable.loc, use.default.componentTable) : Different number of chemical building blocks in the component composition matrix and in the onboard list of exact masses. Check your composition matrix carefully. Aborting...

jamesrco commented 7 years ago

Hi Marc:

I am unable to reproduce the calcComponentMasses building blocks error when I regenerate the default databases in this way, and I am able to generate the databases without issue. Tell me what you get when you try the below:

ncol(read.table("LOBSTAHS_componentCompTable.csv", sep = ",", header = TRUE, row.names = 1))

Result should be 18.

MarcNIOZ commented 7 years ago

Hmm, strange. I checked it and it also gives 18.

Is there a way to download the original .csv file? Or can you send it to me? Perhaps it has something to do with excel and the convertion to .csv.

jamesrco commented 7 years ago

Did you download directly from here: https://github.com/vanmooylipidomics/LOBSTAHS/blob/master/inst/doc/csv/LOBSTAHS_componentCompTable.csv ?

MarcNIOZ commented 7 years ago

I downloaded the excel file including the instructions.

From the following page: https://github.com/vanmooylipidomics/LOBSTAHS/blob/master/inst/doc/xlsx/LOBSTAHS_componentCompTable.xlsx

jamesrco commented 7 years ago

What if you try using the .csv from the link above, do you still receive the error?

I also just tried again, this time by saving the second tab of the LOBSTAHS_componentCompTable.xlsx spreadsheet as a .csv file; it worked fine for me.

MarcNIOZ commented 7 years ago

I tried it (by right clicking on raw and then save as). However I couldn't download it as it says "insufficient rights" (translated from Dutch).

jamesrco commented 7 years ago

What is "it" and "raw," and where did you attempt to download the file from? Please be specific -- hard to recreate a workflow with nondescriptive terms.

jamesrco commented 7 years ago

Sorry, you meant you clicked the "raw" tab on this page: https://github.com/vanmooylipidomics/LOBSTAHS/blob/master/inst/doc/csv/LOBSTAHS_componentCompTable.csv and then attempted to download? I am not sure what that error is about; sounds like problem specific to your browser or OS. What if you just copy the text, paste into a text document, and then save it?

MarcNIOZ commented 7 years ago

Ah yes, sorry, it is getting late here. I was about to make a screen shot.

You were right, chrome didn't let me download it. I was able to do it with firefox but still got the same error.

jamesrco commented 7 years ago

After much discussion and trial and error (mostly via a long email chain between @jamesrco and @MarcNIOZ), determined source of the second error was a very critical commit that did not get passed from the LOBSTAHS GitHub repository to Bioconductor in time to meet the Bioconductor 3.5 release date. I've pushed this commit and a few others to both the devel and release-3.5 Bioconductor branches. Any user installing (or re-installing) the package after roughly noon tomorrow (June 28, 2017) using

source("https://bioconductor.org/biocLite.R")
biocLite("LOBSTAHS")

will get the correct code, which contains the necessary fixes. The revised code requires that the custom LOBSTAHS_componentCompTable (if used) contain 18 columns, including a column for silicon atoms ("Si"). Thanks to @MarcNIOZ for helping to figure this one out.

Leaving this issue open for a day or so to make sure the changes propagate, then will close it.

lee-t commented 7 years ago

On a potentially unrelated note, this also cleared up my earlier issue with the vignette. I was using the bioconductor version to run it before. I cloned the latest master and it seems to be working properly.

jamesrco commented 7 years ago

Also, added a check in the generateLOBdbase() code (commits eee8703 and c58f0af) which will catch future users who are supplying a table without an "Si" column and provide them with some specific feedback to correct the error.

jamesrco commented 7 years ago

@lee-t Remind me to walk you though how we get files from this Git repo to the correct place on the Bioconductor Subversion server. It is far from obvious, and very easy to screw up. I've messed it up a bunch of times, and in one of those instances it took me a day at the command line to fix everything. Supposedly they're migrating to pure Git soon, but the project seems to be on hold.

jamesrco commented 7 years ago

Closing this one out. The Bioconductor release installation now loads LOBSTAHS v1.2.1, which contains the necessary fixes.