Open Malabarba opened 9 years ago
This is needed for #251
I'll see if I can't do this over the weekend. I've spent all week working with indexing massive datasets; this seems like an essentially similar problem.
My immediate thought is to just introduce one more layer across the board:
(("main-tag-A" "synonym-1" "synonym-2" "synonym-3")
("main-tag-B")
("main-tag-C" "synonym-a") …)
It would come at a relatively nominal cost of space, but I don't think it's going to get any better. Thoughts?
My only afterthought is the potential wisdom in using vectors, but I forget exactly how we're using this structure and whether vectors would be effective without much hassle. IIRC, vectors have a significantly faster implementation due to being sequential.
I think it's fine. While we're doing backwards incompatible changes, might as well try printing the strings as symbols to save a bit of space.
The only problem I see is that tags containing .
would be printed with a \.
, but there's a variable that controls that.
I've tried
(print-escape-nonascii
print-charset-text-property
print-length
print-level
print-circle
print-escape-multibyte
print-continuous-numbering
print-escape-newlines
print-gensym
print-quoted)
but I can't seem to find the variable you're talking about. (That's almost every variable that begins with print
.)
There's always sed
if we can't find the variable.
No sorry. As long as we use princ
we're fine. The whole point of princ
is that it doesn't quote characters.
Yep:
(princ 'hi.there (current-buffer))
Also, I believe the tags are already returned to use in order of frequency.
Yes, popular
is the default sort: http://api.stackexchange.com/docs/tags, but another thought: As a tag's popularity fluctuates, its position in the list will alter as well. This will increase the diff size. I thought about including the count
property of the information as well, but this would change nearly every time we pull data and the repo history will increase without bound.
Long story short, the sorting of the printed list is going to change from alphabetic to popularity, but expect a rise in repo growth over time (unless you have a better option). I'm starting to wonder if we really should be tracking this stuff – they aren't changes we're making.
Perhaps… perhaps we can do a git rewrite every time we push to data
? It's late and I may not be thinking clearly – I don't know what adverse effects this may have on clones, but nobody should be making changes to the data
branch anyways.
Hm, I like that idea. A few points:
bot
branch (which I'm really not concerned with).bot-3.0/
), so that people using sx-2.0
don't get confronted with errors.
Two things I think the tag bot should do: