[Section] Data management

jordenrabasco commented 2 years ago

This is a data management section. I essentially massage the data to make it more readable for later visualizations. Should this be its own section or should I merge it with the later sections as there is no real biological significance to changing titles and such?

https://github.com/jordenrabasco/Long_read_processing_tutorial/blob/afa1a962b305b79b0473a644cd9133a992bfa9ea/long%20read%20Tutorial.Rmd#L177-L197

benjjneb commented 2 years ago

What is the purpose of this section? Are we teaching a tutorial user how to save RDS files and reload for later? Or... something else?

Also, there is no Silva 128.

jordenrabasco commented 2 years ago

The purpose of the section is to massage the naming and data into a more graphable format. Should I hide this section with the "include=false" flag? I also keep the saving and reloading of the R objects in case they needed to stop the tutorial for any reason after the main analysis portion was complete. I thought it maybe useful in case a user wanted to do a different post analysis graphing than the one provided? I am a bit confused about the last comment? Did you mean there is no trained data in the folder? To download the Silvia 128 file there is a link just before the taxonomy step. I decided to include an R object save there, as it takes awhile to load in and train the data in R

benjjneb commented 2 years ago

The purpose of the section is to massage the naming and data into a more graphable format.

It seems like it would be better to just rename the files so the data massaging isn't necessary. Plus, once a sample metadata file is added (with a filename column), that could be used in a normal R way to pull out any sample-ID info that is needed.

I also keep the saving and reloading of the R objects in case they needed to stop the tutorial for any reason after the main analysis portion was complete. I thought it maybe useful in case a user wanted to do a different post analysis graphing than the one provided?

That may be useful enough to keep (not sure), but if it is in the tutorial it needs to be explained. What an RDS file is, why we're saving it and reloading etc. I think there was some brief text like this in the STAMPS long-read lab I sent you.

I am a bit confused about the last comment? Did you mean there is no trained data in the folder? To download the Silvia 128 file there is a link just before the taxonomy step. I decided to include an R object save there, as it takes awhile to load in and train the data in R

My bad, there is a Silva 128, but we shouldn't be using it. We should use the most up-to-date release (138.1).

jordenrabasco commented 2 years ago

updated

jordenrabasco / Long_read_processing_tutorial

[Section] Data management #13