josiah-wolf-oberholtzer / discograph

Social Graphing for the Discogs Database
MIT License
74 stars 11 forks source link

Special Handling for “Not On Label” #79

Closed delucis closed 8 years ago

delucis commented 8 years ago

Some Discogs entries use the label “Not On Label” — http://www.discogs.com/label/1818 — for self-released albums etc., e.g. http://discograph.mbrsi.org/artist/2477991

In the graph the connections through this label might be misleading and could perhaps be filtered out of the results?

(There may also be other similar holding entities for all I know…)

josiah-wolf-oberholtzer commented 8 years ago

I'm torn on this. It's certainly possible to strip out these virtual entities (e.g. "Not On Label", the "Not On Label" variations, and the pseudo-artist "Various"), and in some earlier versions of the sight I did just that. On the other hand, I do think it's really useful to know that entities have been released via white labels 12"s - certainly just as useful as knowing that 250k artists have been released on some major label.

I would be amenable to provide a toggle for white labels, just like providing a toggle for all labels (as some users simply don't want to see / know about labels). Because I pre-calculate the number of relations of each type for all entities to optimize the graph search algorithm (see here), that pre-calculation logic would need to be extended to count up white-listed and black-listed entity relations separately. Doable. Likewise, the graph search algorithm would need to be extended with flags toggling white labels on and off.

delucis commented 8 years ago

Perhaps you could display the white-label/pseudo entities immediately connected to the node displayed, but toggle connections beyond that level to off by default? You could see that an artist has released 3 white-label recordings, but not need to know how that connects with the thousands of other white-label artists (but have the option still open?)

On the other hand, this is also a kind of database structure issue — Discogs itself might be better off having genuine special entities for this purpose rather than pseudo labels and artists. In which case you could shrug and accept this! :boom:

josiah-wolf-oberholtzer commented 8 years ago

With regard to what connections you see when "Not On Label" appears in a graph: the graph search algorithm doesn't look for additional connections when it encounters "Not On Label". That is, as the search process iterates over each entity, it looks up their relations, and adds unvisited entities in those relations to a stack of to-be-visited entities. If it encounters an entity whose pre-calculated relation count exceeds the number of relations permitted in one subgraph (about 300), it skips looking up relations and simply marks that entity as visited. "Not On Label" has about 88312 relations, so you should never see extra connections to it unless you clicked on it directly.

All of that is to say, if you see "Not On Label" with multiple relations in one graph, that's because the entities connected to it in the graph were arrived at by other means, not because "Not On Label" has an incredible number of relations in the database.

delucis commented 8 years ago

Ohhhhhhh… right, sorry! That’s actually clear from my Lady Leshurr example above — the only “Not On Label” artists included are all also present via other labels/relations. OK, so I think it’s safe to scrap this line of thought entirely. Sorry for the diversion. :car: :car: :car: