Closed ianmilligan1 closed 7 years ago
I'm not formally reviewing, just trying to work through the lesson for fun. The intro text is great, and I especially like the "possibilities" section with links to the cited work!
Testing on a Mac (El Capitan 10.11.6), I ran into problems with both the graphical and command-line Anaconda installers, poked around on StackExchange for a bit, and haven't yet figured out the issue (it's likely particular to my computer setup, but I'm reporting it in case another Mac user also has trouble; I'll update if I figure out the problem.)
@amandavisconti, what's the error that you're getting? Is it in installing Anaconda, or the Feature Reader library? We tried many different ways to simplify the installation as much as possible, if there are still glitches, I'd love to address that pre-emptively. Let me know here, or via email (my username at Gmail).
We recently switched over from pip
instructions to conda
(because the latter favours already built distributions, making things easier for Windows users), so my suspicion is that maybe there's a problem there?
@organisciak The error is with the Anaconda installer, not the Feature Reader library; the graphical display installer gives an error message that suggests I contact the Anaconda project maintainers, and I'll check on the command-line error message when I'm home. (Will also update when I figure out the issue in case it's useful for others.)
Thanks for writing this lesson—I really like the introduction and am looking forward to reading the rest!
(Edit: This is from @amandavisconti—forgot to log out while doing Slack-GitHub integration stuff.)
Thanks @amandavisconti @DigitalHumanitiesSlack for looking into this! It worked on OS X but I don't have access to a Windows box (something I really need to get at some point..).
Anyways, just a quick update @organisciak – I've reached out to two peer reviewers, just waiting on their responses. My experience with academic August is that sometimes we can all be a bit delayed (myself included!), so patience is probably necessary. 😄
Update: two reviewers have accepted and agreed to review the lesson.
This is probably discussed somewhere in the PH world, but I find it a bit awkward to reference particular parts of a document, short of annotating a PDF view or referencing a blame view. Advice?
I don't think you're alone in that problem, actually, which speaks to the hypothesis comments that @amandavisconti coordinated a few months ago.
The quick and easy solution might be to reference line numbers from the edit view, or you could use hypothesis to mark up the document.
I'll ping @wcaleb just for a future conversation to be had... I wonder if in the future, line numbers or something could appear in the ph-submissions site preview.
I think this will make a fabulous addition to the PH family for several reasons, including how it explains many key concepts of working with Pandas and how it provides useful and generative examples of interacting with HTRC. The structure and pace of the document are generally excellent, though I'd suggest that the authors have another couple of passes reading it as though they were encountering for the first time several of the more technical components. Linking even more to external documentation on Python, libraries and even HTRC may be a good way of addressing relative newcomers while not weighing down the notebook too much.
I've embedded several comments into the attached PDF (apologies, but that seemed like the most expedient way to annotate) and I'd be happy to discuss or develop any of the comments (apologies in advance for their brevity).
Some other bigger picture comments not in the PDF:
I've noted above and in the PDF comments quite a few things came to mind as I was reading, things that might be called constructive criticism, but I want to reiterate that there's lots to love here and that I'm looking forward to using this guide with my students!
Data Analysis in Python through the HTRC Feature Reader _ Programming Historian SGS.pdf
Thanks for this @sgsinclair – very much appreciated. The annotated PDF works swimmingly, and I think all of the comments look manageable and well-advised to me. The four big picture comments are also very helpful – points 1, 3, and 4 I'll leave to your thoughts.
As for point 2, I think that has been one tension we've had as the mandate of the Programming Historian continues to grow. Perhaps we could recast a bit with a few examples from literary history? I would be happy to chat and think about some examples we can run.
I'll hold off on more comments at this point as we're still waiting on one more external peer review. Once I receive that, I'll read both reviews, summarize commonalities, and we'll move forward into the revision process.
Again, thanks @sgsinclair for your thoughtful and comprehensive review!
Hi, everyone! I second all of the enthusiastic comments that have already been posted. It was a pleasure to work through this lesson; I look forward to recommending it to colleagues once it's up and running on PH.
Here's a PDF with notes I took while working through the lesson: DataAnalysisInPython_cderose.pdf I'll be glad to elaborate on or clarify any of them.
For a couple more general thoughts:
If in addition to the pip install there are other alternatives or troubleshooting recommendations you could add, that would be especially great for beginners who might get anxious when the code doesn't immediately work and there's no obvious next step, as @sgsinclair points out. If you do come up with other alternatives, I'd be happy to try them out or brainstorm different roadblocks people might encounter in setting things up so that you can preemptively address them.
Such a terrific lesson - can't wait to see it on PH!
Thanks so much @cderose for your insightful comments, both in the prose above as well as in the attached PDF. I think they'll all serve to enhance the lesson's flow and readability, and thanks for catching little things here and there. And the source activate
error trace is very helpful.
Really appreciate it, again.
OK. At this point we'll close "open review," and I will begin synthesizing the two reviews for you Peter.
I think these were two very positive peer reviews, @organisciak, with good concrete suggestions for way to improve the lessons.
My recommendation at this stage is that you go through and incorporate the in-line suggestions from both @cderose and @sgsinclair. I've read them all and they're worth responding to, and I don't think they should be too onerous. And if you want a Milligan reference, feel free to ping me.
Both reviewers noted in different ways how sometimes the lesson skipped over technical things quite quickly. Both Catherine and Stéfan did an excellent job of showing specific locations in the PDFs.
The one outstanding issue is then the choice of examples. As you go through, why don't we try to brainstorm a few quick historical questions we can ask. Maybe we can work through the notebook together, and try to think of a few examples.
So, at this stage, why don't you incorporate the suggestions as you're able Peter. Ideally, this would be completed within the next three weeks. And then we can touch base, have a quick chat about any specific historical examples or other outstanding issues, before beginning the process of moving this over to the main repo.
Great. Thanks everybody! I agree that there are solid recommendations here. This message was buried in my inbox during some travel, so I'm turning to the updates now. @capitanu and I will update as best as we can, then follow-up with @ianmilligan1.
Also, looks like more tinkering is in order on how to most easily install the software. We switched from pip
instructions to conda
instructions because building low-level dependencies was tricky on Windows, but clearly overlooked new sticking points there. Post revision, it will be worth double-checking whatever we figure out: I very much want to make it smooth.
Fantastic, thanks @organisciak. Please do keep me posted.
Also – @sgsinclair – we ended up implementing line numbers on the submissions repo, so thanks so much for the suggestion. Please check it out here. I think this will make our reviewing process moving forward.
Fancy numbering! Out of curiosity, is there a reason why headers don't get numbers? (I'm just thinking of how reviewers may want to refer to those as well.)
@sgsinclair I'll ask your question in this issue here, as feedback is really useful!
@ianmilligan1, I uploaded a new version of the lesson (https://github.com/programminghistorian/ph-submissions/commit/968143b1b2aacaf826a5245093840aa2aa5c46a4#diff-3e93430), responding to the notes from @sgsinclair and @cderose. To the reviewers, your comments were phenomenal, and I believe the lesson is better for it.
What's the next step?
Great, thanks @organisciak. I'll review the diff this week, hopefully by Thursday, and get back to you with any last minute suggestions. Once we're both happy, I'll begin the process of migrating it over to the live Programming Historian repo. You'll get a chance to proof the lesson, and then we'll be off to the races.
Be in touch soon. Can't wait to get this out to the public!
Thanks, Ian. I earlier neglected to format the images in the Jekyll format, so I corrected that tonight.
I have three formatting notes/questions:
For Jekyll questions, I'll ping you over to @wcaleb – any thoughts on images and formulas, Caleb?
For line numbering, they won't be in the final version on our main repo (although you'll still have a chance to proof before we connect it to the main site!).
I've had a chance to go through your diff, @organisciak, comparing it with the comments by both @sgsinclair and @cderose. You've done a fantastic job and I think you should be commended with the care and attention you've made to the suggestions.
I've made two small suggestions which I've pushed forward (a link to JSON and your username).
Both reviewers ran into the same issue with the conda install -c organisciak htrc-feature-reader
command. I have tried to make it a bit more explicit that they should swap out the username. I did so with this language:
Now, you need to type one command:
conda install -c USERNAME htrc-feature-reader
This command installs the HTRC Feature Reader and its necessary dependencies. We specify -c
so that the installation command knows to look for the program under the USERNAME
channel, the username of the library author. You will need to change USERNAME
to your own username, which you can find by typing whoami
on a Mac or echo %USERNAME%
on Windows. For example, the author uses:
conda install -c organisciak htrc-feature-reader
Feel free to push back on this. I've put the language into the repo but you can revert. 😄
Actually, it should be my username! Stefan mentioned that it might cause confusion and I clearly didn't do a good enough job addressing that. I'll give it another try. The alternative: maybe @borice and I should start an 'htrc' account on Anaconda to host the library?
The problem the reviewers ran into was that one of the dependencies, PySolr, isn't in the default conda channel, and I hadn't built a version for Mac OS. I believe I've fixed this. Now that I've spent more time figuring out conda, I'm confident that it is the better way to prepare things for novice users, but it took some learning on my part to offer it properly.
Ahhh, this makes sense. Yeah, my instinct would be to swap out the usernames (I guess I didn't when I installed this way back when however). OK. Let me roll that back.
Maybe an HTRC account for Anaconda would make more sense, if it is trivial to implement?
When you installed it, it was probably when I had the instructions as pip
, so no worries. Turns out it was trivial to add an organization so I made it conda install -c htrc htrc-feature-reader
:)
Great, works like a charm.
Migrated proof over to: http://programminghistorian.org/lessons/text-mining-with-extracted-features
We'll figure out figures & formulas soon. Can you give a bit more detail on what you might want done with figures - you think some are too big?
Also @organisciak @borice: could you send me short bios, either here or via e-mail. Just a few lines if possible.
With the images, I think the easiest way to make them smaller would be to actually change the size of the image file. Our site uses Jekyll templates to generate the images, so it's not easy to tweak the size on a lesson by lesson basis.
As for formulas, would you be able to format the math in LaTeX as described here? I think that our Markdown parser would be able to handle that syntax if you can get your formulas into it.
Ah, two dollar signs, slightly different than the Pandoc markdown that Jupyter follows. I'll also bulk resize the images.
Should updates continue in this repo, to be merged into programminghistorian/jekyll later?
Aye, please do them here and I will merge them. Just ping me when you're happy with both.
Updated with f2b2a18. The resized images won't look great on higher resolution screens (Macbooks and Surfaces), so I kept the originals in case the template is ever updated.
Awesome. Let me move this over and we'll see how it looks on the main site!
OK proof is living here: http://programminghistorian.org/lessons/text-mining-with-extracted-features.
Can you take a look and make sure it looks good to you? And also – short bios would be great, and I can add them to the system.
We found a few errors, the new commit should be the last.
Bios: Peter Organisciak is an information scientist, at the HathiTrust Research Center as a postdoctoral research associate. Boris Capitanu is a research programmer at the HathiTrust Research Center.
Images What do you think of this pamphlet box:
Thanks for the bios! I'll review and move over in a bit. And picture looks great, let me tinker with it.
(May take me a day or two as some short-deadline work has just appeared on my plate..)
Hi Ian,
I didn't respond to Peter in time when he asked me for a bio as I've been out with my little girl for a checkup at the doctor. Could you please use the following for my bio:
Boris Capitanu is a Research Programmer at the HathiTrust Research Center. His research interests include large scale data analysis, machine learning, data mining, and educational technologies. Boris holds a B.S. and M.S. in Computer Science from University of Illinois at Urbana-Champaign, and a MBA+MHRIR from the same institution.
Thank you, Boris
On Nov 23, 2016, at 4:41 PM, Ian Milligan notifications@github.com wrote:
Thanks for the bios! I'll review and move over in a bit. And picture looks great, let me tinker with it.
(May take me a day or two as some short-deadline work has just appeared on my plate..)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/29#issuecomment-262647705, or mute the thread https://github.com/notifications/unsubscribe-auth/AAaT87Nw-qQb2h0XTFnyUGfu-2CxV0-Jks5rBMEZgaJpZM4JZCtG.
Great, thanks @borice & @organisciak.
I've put a new proof up at http://programminghistorian.org/lessons/text-mining-with-extracted-features (you may need to do a hard refresh). I've also taken your suggested image, Peter, and converted it into a 200x200 greyscale. Look good?
Looks great!
On Thu, Nov 24, 2016, 7:43 AM Ian Milligan notifications@github.com wrote:
Great, thanks @borice https://github.com/borice & @organisciak https://github.com/organisciak.
I've put a new proof up at http://programminghistorian.org/lessons/text-mining-with-extracted-features (you may need to do a hard refresh). I've also taken your suggested image, Peter, and converted it into a 200x200 greyscale. Look good?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/29#issuecomment-262791576, or mute the thread https://github.com/notifications/unsubscribe-auth/AADj72rq1-_tNcZKJQ0NUNXYYlGovfVzks5rBaKFgaJpZM4JZCtG .
Hurray!
I have done a soft launch, so you can see your entry here (http://programminghistorian.org/lessons/) - under distant reading. The lesson is also fully live.
That said, I won't launch publicity until Monday (Friday of the Thanksgiving weekend in November is not an ideal launch time) so if you catch anything, let me know. 😄
Awesome. That's also when HathiTrust is publicizing the full collection Extracted Features Dataset which we soft launched in October, so it may get more eyes on the lesson.
On Fri, Nov 25, 2016 at 1:27 PM Ian Milligan notifications@github.com wrote:
Hurray!
I have done a soft launch, so you can see your entry here ( http://programminghistorian.org/lessons/) - under distant reading. The lesson is also fully live.
That said, I won't launch publicity until Monday (Friday of the Thanksgiving weekend in November is not an ideal launch time) so if you catch anything, let me know. 😄
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/programminghistorian/ph-submissions/issues/29#issuecomment-263019180, or mute the thread https://github.com/notifications/unsubscribe-auth/AADj73qLZPUFWu6Q_Fm_Ym1jd_1CZRmVks5rB0SlgaJpZM4JZCtG .
🎉 Hurray, that's awesome timing.
The Programming Historian has received the following tutorial on 'Data Analysis in Python through the HTRC Feature Reader' by @organisciak and @borice. This lesson is now under review and can be read at:
http://programminghistorian.github.io/ph-submissions/lessons/text-mining-with-extracted-features
I will act as editor for the review process. My role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum. I have already read through the lesson and provided feedback, to which the author has responded.
Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.
I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me. You can always turn to @acrymble or @amandavisconti if you feel there's a need for an ombudsperson to step in (as I am not going to act as an ombudsperson on a lesson I am actively editing!).
Anti-Harassment Policy
This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.