morris-lab / Capybara

Capybara: A computational tool to measure cell identity and fate transitions
57 stars 9 forks source link

count.csv generation #5

Closed rigitano closed 2 years ago

rigitano commented 2 years ago

Hello,

I'm having a hard time understanding how the count.csv files were generated from the MCA files from https://figshare.com/articles/MCA_DGE_Data/5435866 . (step 1.5 on GitHub). Due to the large size, these files were not provided.

I didn't succeed in generating the count.csv files from the MCA files. Could I ask for your help on indicating how this was done? (Maybe one count.csv example could be enough to clarify the procedure)

Thank You

Danyi-ZHENG commented 2 years ago

it should be the 500dge/batch_rm_dge folders in the MCA link provided at the top of the webpage.

rigitano commented 2 years ago

Hi,

In the MCA link provided at the top of the webpage (https://figshare.com/articles/MCA_DGE_Data/5435866) I could only find folders with similar, but different names:

MCA_500more_dge.tar.gz\500more_dge MCA_BatchRemove_dge.zip\rmbatch_dge

However, instead of a "cont.csv" file, they contain several ".txt" files. I've checked the older versions, but couldn't find any count.csv files either.

This rises two questions: 1- Which one of the those folders is the right one? 2- How do I transform their .txt content in a count.csv file?

Thank you for your help

Danyi-ZHENG commented 2 years ago

I tried both, and I think the rmbatch one works. Actually you don't have to change txt to csv, just change the read.csv function to read.table function and make sure the loop goes through the whole folder. You don;t have to creat exactly the same file format as they provide.

KaetheKong commented 2 years ago

Hi,

First of all, apologies for the delay! And thank you for using Capybara!

Thanks for helping in responding in this thread! The files "count.csv" were coming from one of the older versions, generated to contain only the genes that are shared across all different cell types. I believe the MCA has been updated in the recent years and please use the newer count files. To generate these count files, I would recommend using the rmbatch ones and stitching different replicates together by joining with the shared genes.

As suggested above, it is not necessary for it to be the same format. Please use read.csv for comma delimited files and read.table for tab delimited files. Please let us know if you have issues reading these files!

Best, Wenjun