robertaboukhalil / ginkgo

Cloud-based single-cell copy-number variation analysis tool
qb.cshl.edu/ginkgo
BSD 2-Clause "Simplified" License
47 stars 28 forks source link

MAD exact #17

Closed mariakalyva closed 5 years ago

mariakalyva commented 5 years ago

Hello,

I have been using Gingko in several data sets and I would like to correlate the MAD values. Therefore I would like to have the MAD value for each cell alone. Is there any output of Ginkgo that can give me the MAD values apart from the plots(which is approximate). If I set up Ginkgo on my server will I be able to get the values perhaps?

Thank you very much, Maria Kalyva

robertaboukhalil commented 5 years ago

Hi Maria,

You currently can't export MAD values as text, but you can get that info if you set up Ginkgo on your server. Specifically, you could use the code in analyze-subset.R to get the MAD values.

Hope this helps.

Robert

mariakalyva commented 5 years ago

Hi Robert,

thanks a lot for helping with this. So i did exactly what you suggested. Looking at the data, apart from the plot the table I get when I run the script on my own consists of 4 columns. I am referring to the "a" array that is created. Would you mind explain the columns? Do i need to take the mean of each row to consider as the correct MAD for each cell? Or should i just take the first column, that is what also is used in the boxplot that Ginkgo creates?

Thanks a lot again, Maria

robertaboukhalil commented 5 years ago

Hi Maria,

Sorry about this, that part of the code isn't very intuitive. The MAD is indeed in variable a, and the columns correspond to:

The boxplot only uses the first column. I think the rest were for testing so you can ignore them

mariakalyva commented 5 years ago

Hi Robert,

thanks for replying so fast and explaining.

Best,

Maria

mariakalyva commented 5 years ago

One last question if you don't mind. Can I use my own reference sample bed to run Ginkgo? When I do it through the online server it appears to crash. Any idea why? This analysis has been quite useful to our data and I want to make the most out of it.

Thanks for your time again, Maria

robertaboukhalil commented 5 years ago

Can you send me the Ginkgo link you're using for the analysis where it crashes?

mariakalyva commented 5 years ago

Hi Robert,

yes of course here is the link http://qb.cshl.edu/ginkgo/?q=results/wTtrLttGP2GtzMTlS8Mp For some reason it appears to "stuck" at 66%. Thank you, Maria

robertaboukhalil commented 5 years ago

Hi Maria,

I've tried a few things and so far my hypothesis is that your BED files contain chromosomes that Ginkgo is not expecting, e.g. chr1_gl000192_random.

Can you try again, but with BED files without those *_gl* chromosomes?

Here's some bash code to do that. It assumes your .bed.gz files are in the folder data/, and it'll create new .bed files in the data2/ folder but also remove lines containing _gl.

mkdir data2/
for file in data/*.bed.gz;
do
  zgrep -v "_gl" "$file" | gzip > "data2/$(basename $file)"
done

I ran a few tests and it seems to be working but let me know if it doesn't

mariakalyva commented 5 years ago

Hi Robert,

it seems to work fine. Thank you very much for your time.

Maria

robertaboukhalil commented 5 years ago

Great!

srbehera commented 1 year ago

What should be the content of "data" file? I put all bed.gz files and run analyze-subset.R new_run_123 analysis hg38 variable_500000_101_bowtie original, but I am getting the following error

Error in FUN(left, right) : non-numeric argument to binary operator
Calls: sweep -> Ops.data.frame -> eval -> eval
Execution halted