Closed matthewspeir closed 5 years ago
I do strip quotes if the file is a .csv file. CSV files can have quotes.
I've never seen quotes in a tsv file though. Quotes are needed in .csv files, because they use comma as the separator. .TSV files should never have quotes, this is an error when the .tsv file was created, the R write.table should have quote=F when writing tsv. Or he could have used write.tsv.
I'll look into stripping quotes...
On Wed, Aug 21, 2019 at 11:48 PM Matt Speir notifications@github.com wrote:
If you have quotes around your cell names in your expression matrix and metadata, cbBuild can't find the common names between the two:
INFO:root:Checking and reordering meta data to /usr/local/apache/htdocs-cells/forAB/fetal-combined-v1/meta.tsv INFO:root:Reading sample names from /hive/users/mspeir/cellbrowserTest/ABlair_HeartOfCells/fetal_v1/fetalCombined_meta.tsv INFO:root:Reading headers of file /hive/users/mspeir/cellbrowserTest/ABlair_HeartOfCells/fetal_v1/exprMatrix.tsv.gz ERROR:root:Meta data and expression matrix have no single sample name in common. Sure that the expression matrix has one gene per row?
Here are the first two cells in the expression matrix:
gene "09W0D_IVS_AAACGGGCACACGCTG" "09W0D_IVS_AAACGGGTCATAACCG"
And here are the first two cells from the meta file:
"09W0D_IVS_AAACGGGCACACGCTG" "09W0D_IVS_AAACGGGTCATAACCG"
You can see that they are exactly the same cells, but cbBuild throws an error.
I can edit the files and remove the double quotes, but it would be nice if cbBuild just dealt with them and I didn't have to edit these files.
Command:
cbBuild -o /usr/local/apache/htdocs-cells/forAB/
Files:
/hive/users/mspeir/cellbrowserTest/ABlair_HeartOfCells/fetal_v1/withQuotes
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/130?email_source=notifications&email_token=AACL4TPQC33NDHP6G2DCW2DQFWZ2HA5CNFSM4IOOVLZ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HGUGPIQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TMU3AY5JQLPYQSQIV3QFWZ2HANCNFSM4IOOVLZQ .
I found a way to reduce the amount of time needed for the additional parsing. The quote stripping will be in the next release. Thanks Matt!
On Thu, Aug 22, 2019 at 11:51 AM Maximilian Haeussler maximilianh@gmail.com wrote:
Oh. Playing with your example was good. This brought up a problem that you haven't found (yet), the old meta parser didn't even accept any quoted CSV file. So I fixed this unrelated thing, quoted values are important for csv files.
As for .tsv files, I'm stripping quotes now. This takes more time with parsing though...
On Thu, Aug 22, 2019 at 11:35 AM Maximilian Haeussler < maximilianh@gmail.com> wrote:
I do strip quotes if the file is a .csv file. CSV files can have quotes.
I've never seen quotes in a tsv file though. Quotes are needed in .csv files, because they use comma as the separator. .TSV files should never have quotes, this is an error when the .tsv file was created, the R write.table should have quote=F when writing tsv. Or he could have used write.tsv.
I'll look into stripping quotes...
On Wed, Aug 21, 2019 at 11:48 PM Matt Speir notifications@github.com wrote:
If you have quotes around your cell names in your expression matrix and metadata, cbBuild can't find the common names between the two:
INFO:root:Checking and reordering meta data to /usr/local/apache/htdocs-cells/forAB/fetal-combined-v1/meta.tsv INFO:root:Reading sample names from /hive/users/mspeir/cellbrowserTest/ABlair_HeartOfCells/fetal_v1/fetalCombined_meta.tsv INFO:root:Reading headers of file /hive/users/mspeir/cellbrowserTest/ABlair_HeartOfCells/fetal_v1/exprMatrix.tsv.gz ERROR:root:Meta data and expression matrix have no single sample name in common. Sure that the expression matrix has one gene per row?
Here are the first two cells in the expression matrix:
gene "09W0D_IVS_AAACGGGCACACGCTG" "09W0D_IVS_AAACGGGTCATAACCG"
And here are the first two cells from the meta file:
"09W0D_IVS_AAACGGGCACACGCTG" "09W0D_IVS_AAACGGGTCATAACCG"
You can see that they are exactly the same cells, but cbBuild throws an error.
I can edit the files and remove the double quotes, but it would be nice if cbBuild just dealt with them and I didn't have to edit these files.
Command:
cbBuild -o /usr/local/apache/htdocs-cells/forAB/
Files:
/hive/users/mspeir/cellbrowserTest/ABlair_HeartOfCells/fetal_v1/withQuotes
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/130?email_source=notifications&email_token=AACL4TPQC33NDHP6G2DCW2DQFWZ2HA5CNFSM4IOOVLZ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HGUGPIQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TMU3AY5JQLPYQSQIV3QFWZ2HANCNFSM4IOOVLZQ .
Awesome. I'll test this after the next release and if all looks good, I'll close this ticket.
I'm closing this now. I think this is fixed everywhere.
If you have quotes around your cell names in your expression matrix and metadata, cbBuild can't find the common names between the two:
Here are the first two cells in the expression matrix:
And here are the first two cells from the meta file:
You can see that they are exactly the same cells, but cbBuild throws an error.
I can edit the files and remove the double quotes, but it would be nice if cbBuild just dealt with them and I didn't have to edit these files.
Command:
Files: