Closed samd1993 closed 4 years ago
This doesn't look like an issue with the file size being too large, but it's still a good idea to remove singletons and doubletons before the analysis.
It appears there is a mismatch in dimensions somewhere, as it's attempting to access the 121st row of an object that only has 120 rows. Are you certain that the number of rows in X, Y, and Z are all the same? If so, there's something else going on. If you're able to send me the data files I can take a look myself, otherwise it's difficult to diagnose what exactly went wrong from the error message alone.
Ok. I will try with and without filtering rare reads because I am somewhat interested in rare species.
And that makes sense because my abundance table had an extra line in it. I removed it and now it is taking much longer to run which probably means it working! Will report back. Thank you
Hi Neal,
So I got the fit mcmc to work. It was in fact an extra line I had to remove. However, when I run the full script (which runs all the analyses at once I am guessing?)
./nutnet-analysis/run-nutnet-analysis.sh -d nutnet-analysis/filtdata -o nutnet-analysis -f 120 -i 20000 -b 10000 -t 20 -c 1
I get all the mimix-no-factor outputs so I am guessing that ran correctly. But there is supposed to be another folder correct? I get this error after many hours into the analysis :
Saving posterior samples of beta_var to /data/home/sdegregori/mimix/nutnet-analysis/mimix-no-factors/beta_var.tsv
Saving posterior samples of beta to /data/home/sdegregori/mimix/nutnet-analysis/mimix-no-factors/beta.tsv
Saving posterior samples of omega to /data/home/sdegregori/mimix/nutnet-analysis/mimix-no-factors/omega.tsv
Saving posterior samples of theta_var to /data/home/sdegregori/mimix/nutnet-analysis/mimix-no-factors/theta_var.tsv
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
✔ ggplot2 3.2.1 ✔ purrr 0.3.4
✔ tibble 3.0.3 ✔ dplyr 1.0.0
✔ tidyr 1.1.0 ✔ stringr 1.4.0
✔ readr 1.3.1 ✔ forcats 0.5.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
Attaching package: ‘reshape2’
The following object is masked from ‘package:tidyr’:
smiths
[1] "Proportion of observations within 95% credible intervals: 0.458333333333333"
[1] "Proportion of observations within 95% credible intervals: 0.983333333333333"
[1] "Proportion of observations within 95% credible intervals: 1"
[1] "Proportion of observations within 95% credible intervals: 0.833333333333333"
Warning message:
`data_frame()` is deprecated as of tibble 1.1.0.
Please use `tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
[1] NaN
[1] NaN
[1] NaN
Parsed with column specification:
cols(
.default = col_double()
)
See spec(...) for full column specifications.
Error in data.frame(otus = ordered(otu_names, levels = rev(otu_names)), :
arguments imply differing number of rows: 0, 2759
Execution halted
It looks like an error with the row numbers but I have double checked a million times and the taxonomy file has the same number of rows as my abundance table.
I'll attach the files I am using so you can see them. For X.csv I put my main factor (nutrient treatment) first and then three other columns specifying microbiome type (hindgut, foregut, or algae). Hopefully I did that correctly. My Z.csv specifies site which was the only blocking factor. There were two sites in total.
I appreciate all the help you have provided thus far. Bayesian is new to me so this has not been a walk in the park.
Cheers,
Sam
X.txt
Y.txt
Z.txt
Hi Neal,
Just wanted to bump this in case it was lost in your inbox. Apologies if you have seen it and are just busy.
Hi Sam,
Congrats on getting the MCMC script to run correctly. That's the biggest hurdle.
The purpose of the run-nutnet-analysis.sh
is to perform the analyses on the NutNet data as described in the paper. If you look under the nutnet-analysis/full-data
you'll find a tax.csv
that maps the microbiome OTUs to their taxonomic names. You're getting an error because your data folder does not contain that file, hence
Error in data.frame(otus = ordered(otu_names, levels = rev(otu_names)), :
arguments imply differing number of rows: 0, 2759
In other words, there are no OTU names found (0 rows) even though your Y.csv
has 2,759 OTUs present.
Avoid using the run-netnet-analysis.sh
script on your own data, as there's really no reason for you to run PERMANOVA and MIMIX w/o Factors. It appears the MCMC script fits MIMIX to your data and saves the posterior samples correctly (fantastic!), so I recommend you go through the summarize-nutnet-analysis.R
script and choose what you would like to run on your results.
I'm afraid it will take some understanding of R and how to work with posterior samples from Bayesian models to make sure you're summarizing the right things. However, the code in there is designed to reproduce the figures in the paper, so if there is one in particular you would like to re-create, you should be able to modify the R code without too much work and run it on your own results.
Ah, I see your files are in txt
. Can you change them to csv
? If you look in the summarize-nutnet-analysis.R
script it is looking for csv files, not txt ones.
The Julia code is running correctly. So any errors you're getting now are in summarize-nutnet-analysis.R
, which is great news because it means you shouldn't need to run the MCMC again, it's simply taking the results of the MCMC and producing numerical and visual summaries from it.
About your files, X and Y look fine based on what you've said. If you only have one blocking factor (site) and there are only two sites, then Z should be a single column where
1
1
...
1
2
2
...
2
(and the dots are filled in with 1s and 2s respectively to the right number of observations)
Correction: you are encountering an error after the MIMIX w/o Factors section of run-nutnet-analysis.sh
, but there is still the MIMIX Gaussian (which there's no need for you to run), and then MIMIX Dirichlet-Laplace (this is the version of MIMIX you really want to run, it's the same as that implemented fit-mcmc.jl
you've already been able to successfully run).
Hi Neal,
That is strange because I do have a tax.csv file (all of my files are csv actually I just switched them to text because github was not letting me upload csv). But since the full nutnet script isn't that useful like you pointed out I guess it isn't issue for me.
The MCMC code gave me:
Lambda-postmean.tsv b_var.tsv beta.tsv g_var.tsv omega.tsv theta_var.tsv
And I see how I can input these into the R code.
Thank you for the help and it seems like after much trial and error I have finally gotten it to work! Cheers, Sam
Oh so if I have my samples in the order of
`Site1 Site1 Site2 Site2 Site1 Site1'
Then the Z file should be: 1 1 2 2 1 1
Instead of:
1 1 2 2 3 3
I thought for some reason that each block was dependent on other covariates which would give me 12 levels for my Z factor but now I see that it is only two since I there are only two sites.
Glad you got it to work! Any idea what the fix was?
And yes, the site configuration should be 1 1 2 2 1 1 in this case.
No idea. When I use my unfiltered OTU file which has 32k OTUs I still get an error. I don;t think I actually posted it here so I'll see if I can find it.
And then for some reason when I filtered out the singletons which reduced the file to about 2.8k OTUs it works like a charm.
Hmm, not sure what's going on there, it sounds like something to do with how the OTU names are being parsed. I'm going to close this issue for now, but please comment with an update if you find a solution. Thanks Sam!
Hi Neal,
After converting my files into a format similar to yours I was almost able to get the script running but ran into this error:
Could this be due to my file sizes being too large? My OTU table has 32k OTUs. If so I can trim it down to 20k and potentially even shorter if I start removing singletons and doubletons etc. Cheers, Sam