Open euriehong opened 13 years ago
Yes, was referring to the results page.
What about we do this for "paste" input but not file input. I don't think this will work with the file input (designed for large datasets). The app doesn't have any knowledge prior input (when a file is uploaded) ; it just writes to disk now. We might be able to do a "dedupe" when files are exported, since we have to do a sort and file conversion anyway.
I have been stalling on this but I think I think it's actually sort of critical. Uploading any kind of bed or gff with regions is dumb. I think a minimal has will only add a few Mb to the RAM footprint.
Eurie: Can the results be uniqued if there are duplicates? For example, entering the same input will result in duplicate rows. Try
14 100705101 100705102 14 100705101 100705102
OR
rs78077282 rs78077282
Ben's response: I suppose so ... but I think we have to do it at the "output snp" level rather than the input. Because if 2 overlapping ranges are entered the input will look the same but some of the output will be duplicated. The ranges are first converted to lists of valid (common) snps and the looked up for their scores.
I didn't do this originally because I was concerned with speed/memory for very large input sets... but since those aren't functioning now anyway, I think it would be reasonable to track if a given SNP is entered and not look it up/report it another time.