griffithlab / GenVisR

Genome data visualizations
Creative Commons Zero v1.0 Universal
206 stars 62 forks source link

maximum input for cnFreq #349

Open blsfoxfox opened 5 years ago

blsfoxfox commented 5 years ago

Hi,

Could you please let me know the maximum number of CNV fragments can I feed into cnFreq? I have tried something over 10,000 and it gives an error:

Did not detect identical genomic segments for all samples ...Performing disjoin operation Detected a large data size, converting to SimpleList for disjoin Error in getListElement(x, i, ...) : GRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment

The code works for the same datasets with only 6,777 rows in it.

Thanks, Bob

zlskidmore commented 5 years ago

@blsfoxfox can you give me the version of GenVisR you're using. I think this is probably related to #312 and #328. These issues caused cnFreq to essentially crash because R could not handle such a large data size without converting the object to a simple list. The problem with simple lists are the computations on that object take much longer than normal. As a compromise the simplelist object is only created if there are > 1000 entries for a given chromosome however this was arbitrary and only tested to work with a 2013 macbook pro.

From your error something must have changed with simplelist and it's interaction with GRanges, i'll have to look into it but to answer your question as long as there are less than 1000 entries for a given chromosome everything should work as normal

blsfoxfox commented 5 years ago

Hi Zach,

The version of GenVisR is 1.14.1. I've read #312 and #328, but I thought 10,000 rows would be a reasonable amount to deal with. Anyway, thanks for your help and really look forward to any updates.

Thanks,


Bob

发件人: Zachary Skidmoremailto:notifications@github.com 发送时间: 2018-12-03 16:40:03 收件人: griffithlab/GenVisRmailto:genvisr@noreply.github.com 抄送: blsfoxfoxmailto:lzb0021@auburn.edu; Mentionmailto:mention@noreply.github.com 主题: Re: [griffithlab/GenVisR] maximum input for cnFreq (#349)

@blsfoxfoxhttps://github.com/blsfoxfox can you give me the version of GenVisR you're using. I think this is probably related to #312https://github.com/griffithlab/GenVisR/issues/312 and #328https://github.com/griffithlab/GenVisR/issues/328. These issues caused cnFreq to essentially crash because R could not handle such a large data size without converting the object to a simple list. The problem with simple lists are the computations on that object take much longer than normal. As a compromise the simplelist object is only created if there are > 1000 entries for a given chromosome however this was arbitrary and only tested to work with a 2013 macbook pro.

From your error something must have changed with simplelist and it's interaction with GRanges, i'll have to look into it but to answer your question as long as there are less than 1000 entries for a given chromosome everything should work as normal

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/griffithlab/GenVisR/issues/349#issuecomment-443898140, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKR6ezNTjhErb_an_9jaK2I35WqlX5Xrks5u1ahDgaJpZM4Y_is4.

zlskidmore commented 5 years ago

how many data points do you have in total for a given chromosome just out of curiosity? I'm happy to walk you thorough cloning this repo and modifying the piece of code forcing coercion to a simple list.

Unfortunately the type of machine used is what will really determine how many entries can be handled before coercion to a simple list.

After thinking about this last nite I think the best approach going forward will be to write some code to not only split the data by chromosome (as is currently done) but to split between chromosome where there is no overlap among samples as well. Essentially splitting the data set into even smaller chunks that can be processed more effeciently.

I'll have to think about how to accomplish this but i'll leave this open for now as a feature request

blsfoxfox commented 5 years ago

chromosome number of rows 1 688 2 430 3 616 4 882 5 553 6 639 7 273 8 500 9 479 10 249 11 676 12 467 13 713 14 331 15 427 16 413 17 1616 18 335 19 499 20 178 21 208 22 564 X 499

It seems only chromosome 17 has data>1,000, maybe I can manually change it to some bigger number?

Thanks,


Bob

发件人: Zachary Skidmoremailto:notifications@github.com 发送时间: 2018-12-04 10:07:24 收件人: griffithlab/GenVisRmailto:genvisr@noreply.github.com 抄送: blsfoxfoxmailto:lzb0021@auburn.edu; Mentionmailto:mention@noreply.github.com 主题: Re: [griffithlab/GenVisR] maximum input for cnFreq (#349)

how many data points do you have in total for a given chromosome just out of curiosity? I'm happy to walk you thorough cloning this repo and modifying the piece of code forcing coercion to a simple list.

Unfortunately the type of machine used is what will really determine how many entries can be handled before coercion to a simple list.

After thinking about this last nite I think the best approach going forward will be to write some code to not only split the data by chromosome (as is currently done) but to split between chromosome where there is no overlap among samples as well. Essentially splitting the data set into even smaller chunks that can be processed more effeciently.

I'll have to think about how to accomplish this but i'll leave this open for now as a feature request

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/griffithlab/GenVisR/issues/349#issuecomment-444155404, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKR6e_iApbj_d5RyA-vkbfky05Odqc_iks5u1p28gaJpZM4Y_is4.