ssc-oscar / lookup

A mirror of bitbucket.org/swcs/lookup
1 stars 4 forks source link

Getting blobs shared by two projects #31

Closed sylviesworld closed 3 years ago

sylviesworld commented 3 years ago

There are cases where running mostShared.sh on a project will not return anything, but when you run zcat /da5_data/basemaps/gz/search.out | grep REPO | sort -t\; -n -k7 you get projects that use a blob originating from that project. Is there a good way to determine what the blob that they share is so that I can see what files were copied?

An example would be: zcat /da5_data/basemaps/gz/search.out | grep bitbucket.org_thekswenson_alpha | sort -t\; -n -k7

returns: bitbucket.org_thekswenson_alpha;o;15;3;4939;4728;1;genomecuration_JAMg;u;402;1111;27044;10652;0 bitbucket.org_thekswenson_alpha;o;15;3;4939;4728;1;sestaton_tephra;o70;29;8;5429;4419;0 bitbucket.org_thekswenson_alpha;o;15;3;4939;4728;2;bitbucket.org_thekswenson_phagerecombination;o;1;0;195;191;0

How do I determine what blob bitbucket.org_thekswenson_alpha and genomecuration_JAMg share?

audrism commented 3 years ago

mostShared.sh returns only blobs that are in 20 or more projects, in this case the shared blob is in one project only.

You can modify mostShared.sh (though it will be much slower) this line

join -v1 $i.fb $i.badfb | ~/lookup/getValues b2ManyP | ~/lookup/lsort 10G -t\; -k2 -n | head | sort -t\; -k1 > $i.fb2n

and change it to

join -v1 $i.fb $i.badfb | ~/lookup/getValues b2P | awk -F\; '{print $1";"(NF-1)}' | ~/lookup/lsort 10G -t\; -k2 -n | sort -t\; -k1 > $i.fb2n

How do I determine what blob bitbucket.org_thekswenson_alpha and genomecuration_JAMg share?

~/lookup/cmpO.sh bitbucket.org_thekswenson_alpha genomecuration_JAMg share

produces

comparing bitbucket.org_thekswenson_alpha and genomecuration_JAMg
1 blobs created in bitbucket.org_thekswenson_alpha used in genomecuration_JAMg
0 blobs created in genomecuration_JAMg used in bitbucket.org_thekswenson_alpha
111 shared between bitbucket.org_thekswenson_alpha and genomecuration_JAMg
4828 blobs unique to bitbucket.org_thekswenson_alpha
26933 blobs unique to genomecuration_JAMg
created in bitbucket.org_thekswenson_alpha and present in genomecuration_JAMg
7a676be00044e5aa8fffd49d793ae9766cacf396;3rd_party/bin/gt
created in genomecuration_JAMg and present in bitbucket.org_thekswenson_alpha