gusandrews / gitissuescraper

A tool to help scrape not just the issues on a GitHub project, but also the comments on each issue.
2 stars 1 forks source link

Terminology fix: it's not a "scraper". #1

Open kfogel opened 10 years ago

kfogel commented 10 years ago

Hey, Gus -- you're going to hate me for saying this now instead of earlier (actually, I did say it during the team call, but James and I spoke at the same time and I'm pretty sure no one heard me; I didn't repeat it then, as the topic moved on). Anyway:

A "scraper" is a program that gets information from human-readable sources that aren't intended for programs (details here and here). In other words, if your program ingests the GitHub issues page -- the same HTML page that you and I see in our browsers -- then it could "scrape" GitHub issues. But if you use the GitHub API to get issue information, then by definition that's not scraping.

Now, obviously, using the API is the right thing to do. I'm not saying any of that should change. I'm just saying don't call it a scraper or refer to what it does as scraping. If you call it a "scraper", people will get confused, because it's the opposite of a scraper -- one only writes scrapers when no API is available.

I realize that this involves not only changing the documentation and filenames and variable names in your code code, but the very name of the program itself. But actually, the name needed to change anyway, because the program is no longer only for Git issues (by the way, don't forget to update the README.md file too to indicate that support now extends beyond just GitHub).

gusandrews commented 10 years ago

Out of bounds, Karl. It's the weekend, this is my personal account, I was aware it's not a proper scraper, and James has told me I need to wind down the work on this anyway. And this is the second time I've made an honest working stab at code and someone told me it was wrong for cultural reasons, not functional ones, the first time being when I applied to Hacker School. It's discouraging.

Gus On Jun 22, 2014 2:11 AM, "Karl Fogel" notifications@github.com wrote:

Hey, Gus -- you're going to hate me for saying this now instead of earlier (actually, I did say it during the team call, but James and I spoke at the same time and I'm pretty sure no one heard me; I didn't repeat it then, as the topic moved on). Anyway:

A "scraper" is a program that gets information from human-readable sources that aren't intended for programs (details here http://en.wikipedia.org/wiki/Data_scraping and here http://en.wikipedia.org/wiki/Web_scraping). In other words, if your program ingests the GitHub issues page -- the same HTML page that you and I see in our browsers -- then it could scrape GitHub issues. But if you use the GitHub API to get issue information, then by defintion that's not scraping.

Now, obviously, using the API is the right thing to do. I'm not saying any of that should change. I'm just saying don't call it a scraper or refer to what it does as scraping. If you call it a "scraper", people will get confused, because it's the opposite of a scraper -- one only writes scrapers when no API is available.

I realize that this involves not only changing the documentation and filenames and variable names in your code code, but the very name of the program itself. But actually, the name needed to change anyway, because the program is no longer only for Git issues (by the way, don't forget to update the README.md file too to indicate that support now extends beyond just GitHub).

— Reply to this email directly or view it on GitHub https://github.com/gusandrews/gitissuescraper/issues/1.

kfogel commented 10 years ago

Gus Andrews notifications@github.com writes:

Out of bounds, Karl. It's the weekend, this is my personal account, I was aware it's not a proper scraper, and James has told me I need to wind down the work on this anyway. And this is the second time I've made an honest working stab at code and someone told me it was wrong for cultural reasons, not functional ones, the first time being when I applied to Hacker School. It's discouraging.

Gonna push back a bit on this one, Gus.

It's not out of bounds. It's a bug report against a public repository -- anyone could have filed it. The boundary between functional and cultural is not so sharp: the words you use in describing your program to other programmers are a matter of functionality, in that people take longer to understand the code if they get confused by the terminolgy.

It's sort of like saying that commenting one's code is a cultural matter rather than a technical matter: sure, if the only thing that matters in the world is how the compiler or interpreter runs the code. But communication is an integral part of programming.

I don't see how the fact that it's the weekend or that it's your personal account matters. (But if it does matter, then James can't be telling you what to wind down and what not, because it's personal not work, right?) However, you've been talking about it at work, and I specifically committed in our team meeting to looking at this code, and you agreed with that and welcomed it. So please don't tell me now that it's the weekend and that it's personal not work. I'm doing what I promised to do, and what you agreed I would do.

Don't hate on people who take the time to deliver well-meaning bug reports that are written in a perfectly friendly tone, that cite references, and that go out of their way to say that they think the way you're doing the code itself is right and that it's only the terminology that's confusing. I also had comments on the code itself (that would have started with questions, so I would understand the goals better first), and now look: I'm afraid to deliver them, because my first bug report got flamed. Sad Karl :-(.

gusandrews commented 10 years ago

Sorry -- I don't know if this was totally clear, but this comes in a season in which I am having a really hard time with people at HOPE giving me shit in a semi-gendered way. I have kind of a short fuse about anything related to technology right now, so I'm trying to reserve what patience I have left for work.

We should probably pull out of this email exchange and voice-chat sometime this week. Written communication doesn't communicate well what's going on..

Gus

On Sun, Jun 22, 2014 at 2:01 PM, Karl Fogel notifications@github.com wrote:

Gus Andrews notifications@github.com writes:

Out of bounds, Karl. It's the weekend, this is my personal account, I was aware it's not a proper scraper, and James has told me I need to wind down the work on this anyway. And this is the second time I've made an honest working stab at code and someone told me it was wrong for cultural reasons, not functional ones, the first time being when I applied to Hacker School. It's discouraging.

Gonna push back a bit on this one, Gus.

It's not out of bounds. It's a bug report against a public repository -- anyone could have filed it. The boundary between functional and cultural is not so sharp: the words you use in describing your program to other programmers are a matter of functionality, in that people take longer to understand the code if they get confused by the terminolgy.

It's sort of like saying that commenting one's code is a cultural matter rather than a technical matter: sure, if the only thing that matters in the world is how the compiler or interpreter runs the code. But communication is an integral part of programming.

I don't see how the fact that it's the weekend or that it's your personal account matters. (But if it does matter, then James can't be telling you what to wind down and what not, because it's personal not work, right?) However, you've been talking about it at work, and I specifically committed in our team meeting to looking at this code, and you agreed with that and welcomed it. So please don't tell me now that it's the weekend and that it's personal not work. I'm doing what I promised to do, and what you agreed I would do.

Don't hate on people who take the time to deliver well-meaning bug reports that are written in a perfectly friendly tone, that cite references, and that go out of their way to say that they think the way you're doing the code itself is right and that it's only the terminology that's confusing. I also had comments on the code itself (that would have started with questions, so I would understand the goals better first), and now look: I'm afraid to deliver them, because my first bug report got flamed. Sad Karl :-(.

— Reply to this email directly or view it on GitHub https://github.com/gusandrews/gitissuescraper/issues/1#issuecomment-46787944 .

gusandrews commented 10 years ago

and I'm sorry, I'm also writing emails in haste, so I wasn't explaining well enough: What I'm trying to do is really ensure I take my weekends off from anything work or HOPE related -- away from the computer in general. I wasn't tracking that my GitHub account would send mail to this address. In fact, I haven't put anything up that anyone "public" cared about before, so I didn't even realize mail would come through here -- it's never happened before.

I understand I work in an open-source organization, and I appreciate the goals of open code. But I wouldn't have posted it to GitHub at all if Jonathan hadn't recommended it (or James if Jonathan hadn't been right there telling me to). It's my shitty code that barely works for me. It's my crude handmade tool; I don't know that I'd lend it to anyone, much less give it to someone else in a business context. I don't expect it will be of use to anyone. I barely feel comfortable posting it there because I know I don't write code in a way that is up to community norms, because my education is full of holes.

anyway. gotta go do laundry. it's the weekend. Gus

On Sun, Jun 22, 2014 at 6:13 PM, gus andrews gus.andrews@gmail.com wrote:

Sorry -- I don't know if this was totally clear, but this comes in a season in which I am having a really hard time with people at HOPE giving me shit in a semi-gendered way. I have kind of a short fuse about anything related to technology right now, so I'm trying to reserve what patience I have left for work.

We should probably pull out of this email exchange and voice-chat sometime this week. Written communication doesn't communicate well what's going on..

Gus

On Sun, Jun 22, 2014 at 2:01 PM, Karl Fogel notifications@github.com wrote:

Gus Andrews notifications@github.com writes:

Out of bounds, Karl. It's the weekend, this is my personal account, I was aware it's not a proper scraper, and James has told me I need to wind down the work on this anyway. And this is the second time I've made an honest working stab at code and someone told me it was wrong for cultural reasons, not functional ones, the first time being when I applied to Hacker School. It's discouraging.

Gonna push back a bit on this one, Gus.

It's not out of bounds. It's a bug report against a public repository -- anyone could have filed it. The boundary between functional and cultural is not so sharp: the words you use in describing your program to other programmers are a matter of functionality, in that people take longer to understand the code if they get confused by the terminolgy.

It's sort of like saying that commenting one's code is a cultural matter rather than a technical matter: sure, if the only thing that matters in the world is how the compiler or interpreter runs the code. But communication is an integral part of programming.

I don't see how the fact that it's the weekend or that it's your personal account matters. (But if it does matter, then James can't be telling you what to wind down and what not, because it's personal not work, right?) However, you've been talking about it at work, and I specifically committed in our team meeting to looking at this code, and you agreed with that and welcomed it. So please don't tell me now that it's the weekend and that it's personal not work. I'm doing what I promised to do, and what you agreed I would do.

Don't hate on people who take the time to deliver well-meaning bug reports that are written in a perfectly friendly tone, that cite references, and that go out of their way to say that they think the way you're doing the code itself is right and that it's only the terminology that's confusing. I also had comments on the code itself (that would have started with questions, so I would understand the goals better first), and now look: I'm afraid to deliver them, because my first bug report got flamed. Sad Karl :-(.

— Reply to this email directly or view it on GitHub https://github.com/gusandrews/gitissuescraper/issues/1#issuecomment-46787944 .

kfogel commented 10 years ago

Totally understand -- we can chat by voice about it soon (when you've got time, but FWIW I should have some tomorrow). Enjoy the rest of the weekend!