lgatto / pRolocdata

Data accompanying the pRoloc package
5 stars 5 forks source link

Human markers #21

Closed lgatto closed 7 years ago

lgatto commented 7 years ago

@lmsimp - do you know if we have incorporated the update human markers curated by Claire and Katerina?

lmsimp commented 7 years ago

No I don't think so. This should be updated ASAP. The current markers are out of date and very old. Claire has a recent set she has curated for her human THP cells, I think these are consistent with her U2-OS markers.

lgatto commented 7 years ago

@ClaireMulvey - could you help us with this and send us these markers. We can chat next time we are all in together.

ClaireMulvey commented 7 years ago

Hi both

here are the latest human markers. I have attached two files.

My list (CM) is quite stringent. Katerina's marker set (AG) is more lenient but more extensive. She allows for peripherally-associated membrane proteins, whereas I prefer more unambiguous integral proteins.

Another point to note is that Katerina's list allows "cytoplasm" in the Cytosol cluster, whereas I have tried to keep this mainly to "cytosolic" proteins.

Katerina's list also includes sub-nuclear annotations, which I didn't use - I have a single "nuclear" annotation (I don't do chromatin enrichment).

And of course, although the two lists are highly overlapping they are both optimised for different datasets, so there is some bias towards the context-specific needs of each dataset :)

In terms of uploading the file to pRolocdata, probably a merged version of both files is the best.. The user can then decide to filter the list for their own specific needs.

Have a look at attached and see what you think! If you go with Katerina's file, maybe just check with her that this is the most up-to-date version of her markers.

Claire (PS - I couldn't upload csv files to this github?!)

human_markers_AG_2016.txt

human_markers_CM_Nov_2016.txt

lgatto commented 7 years ago

Fantastic, @ClaireMulvey, thank you very much!

lmsimp commented 7 years ago

I would perhaps use Claire's more stringent list, but if we use a combined version of both lists I would suggest not using any markers that Katerina calls 'Cytoplasm' or 'Cytosol', as @ClaireMulvey says Katerina has used these terms to very generally describe proteins which have been seen with a generic term 'Cytoplasm' and this is quite often results in labelling everything!

Hope that makes sense!

lgatto commented 7 years ago

I'll go for @ClaireMulvey list.

ClaireMulvey commented 7 years ago

If there is time, I can go through and merge the best of both lists. Although I am happy with my more stringent list, maybe othe people think I am overly restrictive. Katerina is definitely more relaxed, but I think this can be a bit of a risky strategy. I prefer to be strict with markers and then more permissive when going through the classification results. C

lmsimp commented 7 years ago

I think it's better to be cautious when defining markers. Especially if these markers are to be (potentially) used by others. I would just go for the stringent list.

ClaireMulvey commented 7 years ago

I think my set is more stringent but we have to bear in mind that my markers look great on my dataset but not as good as K's markers on the U2-OS dataset. So each dataset is a bit biased. I can send you a merged version tomorrow if that is not too late...

lmsimp commented 7 years ago

Yep I agree - in fact we just tried your markers on Andy's old dataset and they looked really good so thought this was a good check! If you'd prefer a merged version to be distributed that's fine, just more work for you :-)

lgatto commented 7 years ago

This has now been committed to pRoloc version 1.15.4 (devel version) https://github.com/lgatto/pRoloc/commit/c6cf12f2a2a8acb96033fe357409335089a950f5