twitter-archive / twitter-text-conformance

Conformance testing data for the twitter-text-* repositories
77 stars 31 forks source link

Guidelines for new conformance tests #26

Open garu opened 12 years ago

garu commented 12 years ago

I'm writing a new twitter-text library and it already passes all tests in 'twitter-text-conformance', but I'm not really confident it's correct. I say that because I feel a lot of unit tests are missing.

Take the username validation in validate.yml, for example. It has a test that says "valid username: a-z < 20 characters" and another one saying "All numeric username are allowed". How about mix-and-match? Is 20 the overall username size limit? How about Unicode? is "@" a valid username? Should the unicode "@" also be considered a valid username marker?

I had a lot of questions like that while I coded, and I still do. I'd love to volunteer and write all those tests, but I'm not the authority here so I can't pick what's valid and what's not off the top of my head - nor am I willing to try posts on my own Twitter account just for testing purposes (my followers would get crazy :)

tl;dr - Is there an implementation I can use as "correct"? This way I can use it as authoritative and see whether the new tests passes or fails.

Thanks!

keitaf commented 12 years ago

Hi @garu

Here are the "official / reference" implementations of twitter-text. https://github.com/twitter/twitter-text-rb https://github.com/twitter/twitter-text-js https://github.com/twitter/twitter-text-java

They're being updated frequently so we cannot say the current implementations define the "correct" behaviors, but they are the ones currently used in the productions.

ablick commented 12 years ago

Can you please clarify what you mean by "they are the ones currently used in the productions"? If they are used in production on Twitter itself, then isn't that the defacto "correct" behavior?

I ask because I've found a scenario where the twitter-text-java implementation behaves differently than the text box on Twitter.com. If you enter " http://google.com ", where the spaces before and after the URL are UTF-8 non-breaking-space characters (\u00A0 in Java), then the text box on Twitter.com will find the link, count it as 20 characters, and display "Link will appear shortened." But when you pass that same string through the twitter-text-java library, it won't find the URL when you call Extractor.extractURLs().

keitaf commented 12 years ago

Yes they are used in production on Twitter itself.

And thank you for reporting a bug. Ideally they should have the same behavior (and twitter-text-conformance is to help verifying their consistency) but as you pointed out there are still some inconsistency. I'll fix the bug on twitter-text-java.