ropensci / unconf17

Website for 2017 rOpenSci Unconf
http://unconf17.ropensci.org
64 stars 12 forks source link

Code to Picture/Tweet to Code #17

Open seankross opened 7 years ago

seankross commented 7 years ago

I've been inspired by this discussion on Twitter:

These are the kinds of tweets I'm referring to in this discussion:

I actually like seeing screenshots of R code in tweets, but then of course I wish I had the code! You could try to extract the code with tesseract but doing that every time for every tweet can be messy.

What do you think about building a package that takes an R file and creates a screenshot of the code (with options that optimize the screenshot for Twitter), and then in that package we include trained tesseract models for extracting code from those screenshots. There could even be a function that takes a tweet and gives you the code like tweet_to_code("https://twitter.com/drob/status/840232496860135424", file = "drob.R"). Right now my idea for taking screenshots is webshot::appshot().

maelle commented 7 years ago

@seankross what a great discussion to be inspired by :wink:

I guess you might also need some sort of hunspell thing with a dictionary made of R functions to help correcting the typos that'd probably be created by the OCR?

maelle commented 7 years ago

Oh wait I guess the typos thing is what you mean by "trained" tesseract models sorry.

jsta commented 7 years ago

This sounds like a valuable complement to reprex.

jasdumas commented 7 years ago

This is an interesting idea!

MilesMcBain commented 7 years ago

Definitely interested in this idea, but particularly the output side:

Right now my idea for taking screenshots is webshot::appshot()

This is a great!

So what if we injected some metadata into the image that contained a link to a gist with the output? A corresponding fetch method could take a tweet URL and automatically return the code, using the image metadata to find the corresponding gist.

With this facility the need for OCR would hopefully phase out over time. Although I think the OCR idea is worthwhile in the first instance.

MilesMcBain commented 7 years ago

A LoFi version of this could just return a gist link and an image, leaving the user to pair them in a tweet. But, then you have the link eating into your original witty remark. I dunno how acceptable that would be. In this community probably not very.

batpigandme commented 7 years ago

Could we bot this? I'm thinking something opt-in, where a user would have to set up something akin to IFTTT approval, and it could have a trigger tag. Even as I write this now it's beginning to sound too convoluted, but the end idea would be that there would be a reply triggered with the gist link, as not to cut into witty-remark real estate.

noamross commented 7 years ago

Twitter, it seems, strips embedded metadata out of image files after it extracts location data: https://support.twitter.com/articles/20156423

Twitter supports attaching metadata to an image in the tweet itself, though I'm not sure what/how much metadata is supported. It can be consumed by the reader. I imagine they don't want this to serve as a place to hide large payloads of information, but putting a gist link in alt_text is probably OK:

Using this would probably require having the R package not just generate the image, but post the tweet as well.

stegasaur, by the incomparable @richfitz, will encode text or arbitrary R objects into images via steganography. This may be the way to go if the data survives any image optimization twitter may perform.

jennybc commented 7 years ago

reprex and gistr seem very relevant to this. I have contemplated having a "tw" venue in reprex already. I think going from code to gist + tweet w/ gist URL + screenshot makes a lot more sense than from tweet w/image to code.

seankross commented 7 years ago

Another idea: As much as I don't like QR codes in principle, maybe there should be a QR or other kind of barcode that's embedded in the image and we could store the gist url there, although this wouldn't be necessary if we can read and write the tweet metadata.

Also @MilesMcBain we could add the gist link to the image itself so a human could read it, or they could use tweet_to_gist([tweet url]) to get the gist url. The image itself would look something like:

# https://gist.github.com/hadley/37c8078eb9d46b5dac7e
awesome_stack_overflow_data %>%
  dplyr_function() %>%
  tidyr_function() %>%
  ggplot3(aes = c(language, awesomeness)) +
    geom_oculus(fov=Inf)
hrbrmstr commented 7 years ago

How about riffing off of/extending: https://github.com/hrbrmstr/hrbraddins/blob/master/R/tweet-share.r

benmarwick commented 7 years ago

+1 for @noamross's suggestion for using stegasaur to transmit code via twitter images!

noamross commented 7 years ago

I realize that PNGs uploaded to twitter seems to be converted to JPGs. Not sure whether the steganographic encoding will survive that.

hrbrmstr commented 7 years ago

some algos can gen steg data that will at least partially survive but it's unlikely source code will.

a good chunk of providers use https://github.com/cloudflare/jpegtran (or a derivative) and one of the design goals is to beat malware which has a side-effect of beating steg in most cases.

On Tue, Mar 28, 2017 at 3:33 PM, Noam Ross notifications@github.com wrote:

I realize that PNGs uploaded to twitter seems to be converted to JPGs. Not sure whether the steganographic encoding will survive that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ropensci/unconf17/issues/17#issuecomment-289879930, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfHtkoZw2EeW-U3yqrC8HXr6rXBNG00ks5rqWCQgaJpZM4Mg2DJ .

sfirke commented 7 years ago

I lean toward the options resulting in an image + gist link. The steganography and OCR approaches sound fun but I suspect they will be less accessible to many people who are interested in the code. That's worth giving up some tweet characters to the gist URL, IMO. @jennybc I agree this feels natural to include in the reprex package as a "tw" option in the existing reprex::reprex function.

karthik commented 7 years ago

💯 to @sfirke steganography feels very much like cool and fun but far less accessible to cram information in a less accessible spot. A gist, + short URL + screenshot seems best

jennybc commented 7 years ago

@hrbrmstr's "trick" for getting the screenshot is great but will require LaTeX, because PDF, right? I wonder if there's a way around that?

jennybc commented 7 years ago

Maybe render to html and use webshot?

Oh then you need PhantomJS 😐.

hrbrmstr commented 7 years ago

I always worry abt phantomjs working consistently and also being abused by malware on windows On Tue, Mar 28, 2017 at 17:50 Jennifer (Jenny) Bryan < notifications@github.com> wrote:

Maybe render to html and use webshot https://github.com/wch/webshot?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/unconf17/issues/17#issuecomment-289915625, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfHtlfI42yXhJzYck6jek6tj944H7m_ks5rqYCFgaJpZM4Mg2DJ .

noamross commented 7 years ago

I note that Twitter's alt_text field is designed for, and used by, people with visual impairments, so we wouldn't want to hijack it for other purposes. But including text such as, "Image of R code, full code at https://gist.github...." would make the screenshot more accessible and would be a great use of the field whether or not the gist link is included in the tweet separately.

noamross commented 7 years ago

I think you can avoid LaTeX or PhantomJS/webshot solutions entirely by just placing the text onto a blank image with the R graphics device. Some careful tweaking would be needed to make it look good and be right-sized for arbitrary code, and you'd want to pick a good sans font that is accessible to R on most systems, but it avoids any round-tripping.

hrbrmstr commented 7 years ago

imagemagick supports text annotations directly on an image and I'm 99.999% sure (didn't test it) that magick::image_annotate() implements that part of the imagemagick API. i had this as a mental note to try vs my slacker-use of knit-pdf-to-image.

On Wed, Mar 29, 2017 at 8:11 AM, Noam Ross notifications@github.com wrote:

I think you can avoid LaTeX or PhantomJS/webshot solutions entirely by just placing the text onto a blank image with the R graphics device. Some careful tweaking would be needed to make it look good and be right-sized for arbitrary code, and you'd want to pick a good sans font that is accessible to R on most systems, but it avoids any round-tripping.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/unconf17/issues/17#issuecomment-290070507, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfHtrz_wLdn33f50FSQ2wrTHH2xlffeks5rqkpZgaJpZM4Mg2DJ .

noamross commented 7 years ago

One can do it with just png(), plot.new(), and text(), no? No magic required.

batpigandme commented 7 years ago

After seeing another tweet of pic of code => optimize discussion yesterday (https://twitter.com/tonyfischetti/status/866457187140370433), wanted to reiterate how valuable I think this could be (which might just involve spreading the word re @hrbrmstr's tweet-share script in hrbraddins, if we think it's already been covered).

  1. I hate pictures of code as much as the next person, but, even in instances when the code is < 140 characters, it's not exactly easy to read directly on twitter. If longer, a link to a gist or snippet without an image means that I'm pretty much clicking blindly without clues as to whether or not I actually can help with whatever the question is.
  2. Though I think StackOverflow is great, it's a medium for more "rigorous" question asking, and answering. In the very biased sample of tweets with R code that I see and/or create, the questions, suggestions, etc. seem more casual-- they're "feelers" of a sort. When you respond, there's no expectation that you're giving a definitive, or comprehensive answer, which is okay, since it's twitter. Yes, tweet Qs can be lazy (featuring mine own), but at times #rstats twitter really delivers.
  3. I think there's value in seeing the thread (e.g. w/ the tweet I included at the beginning). Though the same can be done with gists, etc., I think it's a different audience.
  4. If your question is far too complex for twitter, someone will tell you. They'll point you to SO, or send you over to @jennybc's reprex, where you (naïve question-asker that you are) can learn how to help people help you. So, it's valuable either way.
benmarwick commented 6 years ago

For readers curious about the result of this discussion, the pkg is here: https://github.com/ropenscilabs/codefinch (googling took me to this thread, and probably will again the next time I forget the pkg name, so this comment is a kind of redirecting bookmark)

seankross commented 6 years ago

For potentially future reference, I'm currently quite infatuated with carbon: https://carbon.now.sh/

MilesMcBain commented 6 years ago

Oooooooh carbon. Looks awesome!

stephlocke commented 6 years ago

Aye!

And of course I'd love asciinema to come to windows so we could do terminal gifs and then tweet them on windows https://asciinema.org/

MilesMcBain commented 6 years ago

I realised it was going to be a few minutes work to add carbon support to gistfo, since they support gists. So you can now send the active RStudio tab to carbon.now.sh: https://github.com/MilesMcBain/gistfo