lintool / warcbase

Warcbase is an open-source platform for managing analyzing web archives
http://warcbase.org/
161 stars 47 forks source link

Documenting all Functions in Wiki #174

Closed ianmilligan1 closed 8 years ago

ianmilligan1 commented 8 years ago

This is a big task, but I think down the road we should aim to document every function baked into warcbase. i.e. DetectLanguage, DetectMimeTypeTika, ExtractBoileripeText, ExtarctEntities, etc. etc. Right now, I know what's out there from munging around in the repo, but a glossary of various commands and examples how to use them would be helpful.

This would go for the commands baked within them, too, such as RecordLoader.loadWarc, ExtractEntities.extractFromRecords etc.

This might be a perfect job for a summer student?

lintool commented 8 years ago

@aliceranzhou I assume Scala has the equivalent of Javadoc, e.g., http://www.scala-lang.org/api/current/#package

Can you send up the proper Maven build command to generate such docs? Then add this command to README.md and anyone can build up-to-date docs. Then we can just use the inline comments for the UDFs as docs.

aliceranzhou commented 8 years ago

Will do!

On Tue, Nov 24, 2015 at 8:21 PM Jimmy Lin notifications@github.com wrote:

@aliceranzhou https://github.com/aliceranzhou I assume Scala has the equivalent of Javadoc, e.g., http://www.scala-lang.org/api/current/#package

Can you send up the proper Maven build command to generate such docs? Then add this command to README.md and anyone can build up-to-date docs. Then we can just use the inline comments for the UDFs as docs.

— Reply to this email directly or view it on GitHub https://github.com/lintool/warcbase/issues/174#issuecomment-159456054.

aliceranzhou commented 8 years ago

Oh this is really cool – it's already there: mvn scala:doc generates Scala docs and outputs results in the target/site directory.

On Tue, Nov 24, 2015 at 8:29 PM Alice Zhou alice.zhou@gmail.com wrote:

Will do!

On Tue, Nov 24, 2015 at 8:21 PM Jimmy Lin notifications@github.com wrote:

@aliceranzhou https://github.com/aliceranzhou I assume Scala has the equivalent of Javadoc, e.g., http://www.scala-lang.org/api/current/#package

Can you send up the proper Maven build command to generate such docs? Then add this command to README.md and anyone can build up-to-date docs. Then we can just use the inline comments for the UDFs as docs.

— Reply to this email directly or view it on GitHub https://github.com/lintool/warcbase/issues/174#issuecomment-159456054.

aliceranzhou commented 8 years ago

I've added the instruction to README.md under the other mvn instructions.

Would we publish the site?

On Tue, Nov 24, 2015 at 8:41 PM Alice Zhou alice.zhou@gmail.com wrote:

Oh this is really cool – it's already there: mvn scala:doc generates Scala docs and outputs results in the target/site directory.

On Tue, Nov 24, 2015 at 8:29 PM Alice Zhou alice.zhou@gmail.com wrote:

Will do!

On Tue, Nov 24, 2015 at 8:21 PM Jimmy Lin notifications@github.com wrote:

@aliceranzhou https://github.com/aliceranzhou I assume Scala has the equivalent of Javadoc, e.g., http://www.scala-lang.org/api/current/#package

Can you send up the proper Maven build command to generate such docs? Then add this command to README.md and anyone can build up-to-date docs. Then we can just use the inline comments for the UDFs as docs.

— Reply to this email directly or view it on GitHub https://github.com/lintool/warcbase/issues/174#issuecomment-159456054.

lintool commented 8 years ago

We should, but not yet... because the documentation is still pretty sparse :)

The easiest way would be to just check in the documentation and use gh-pages to host. However, things in target/ really are meant to be built, not checked-in... so we'll probably end up creating a separate repo to host the documentation, something like warcbase.docs.

aliceranzhou commented 8 years ago

Ah, I see. Sounds good :)

On Tue, Nov 24, 2015 at 9:16 PM Jimmy Lin notifications@github.com wrote:

We should, but not yet... because the documentation is still pretty sparse :)

The easiest way would be to just check in the documentation and use gh-pages to host. However, things in target/ really are meant to be built, not checked-in... so we'll probably end up creating a separate repo to host the documentation, something like warcbase.docs.

— Reply to this email directly or view it on GitHub https://github.com/lintool/warcbase/issues/174#issuecomment-159466238.

ianmilligan1 commented 8 years ago

Took a kick at documenting some more functions - http://lintool.github.io/warcbase-docs/Spark:-Several-Basic-Commands/.

lintool commented 8 years ago

broken link?

ianmilligan1 commented 8 years ago

Fixed here: http://lintool.github.io/warcbase-docs/Spark-Several-Basic-Commands/

There was an error in that the :- string in the links broke URL generation when first loading the page, so I changed all URLs so that they worked properly. I'll take a quick search through our repo to make sure there are no broken links.