marklogic-community / marklogic-spring-batch

Write batch processing applications in MarkLogic
Other
8 stars 26 forks source link

Import JSON Documents from Directory #57

Closed sastafford closed 8 years ago

sastafford commented 8 years ago

As a user I want to TITLE So that I can quickly ingest JSON documents and begin to query over the data

Given three JSON documents in a directory

When

Then two of the three documents exist in MarkLogic after the job concludes

sastafford commented 8 years ago
  1. Do we have to ingest JSON documents via a REST end point or there is different protocol that you would like to use to talk to ML? You are going to modify the LoadDocumentsFromDirectoryJob to import JSON files. You are going to use the same ItemReader but a different ItemProcessor and ItemWriter. You will need to create a new ItemWriter similar to the DocumentItemWriter.
  2. Is this about just picking a JSON and storing it as it is? Yes
  3. Do we have to support document URI generation? _No. That will be my job when I complete issue #60. For now, use the URI_ID parameter.
  4. Do we have to support the URI pattern? (controlled by the user by config) like prefix all document URIs with /xyz/abc/ Out of scope for this issue, but that is something that we need to support. I might be able to address via Issue #60.
  5. Do we have to support storing the property fragment along with document? (collection, privileges) No, we can add later
  6. Do we have to ingest line delimited json as separate documents? No, that is out of scope for this issue
  7. Do we have to ingest array elements (if in the root) as separate documents? No, that is out of scope for this issue
  8. Do we have any kind of logging/stat requirements? Like how we report back the status of the job and number of documents created? No, that is one of the features of Spring Batch. Batch Job status, number of docs processed are provided by the Spring Batch framework.
  9. Do we have to support any kind validation? Like a json schema validation. _Throw an exception if the JSON is malformed)
  10. Do we have any kind of resource/performance specification? Like don’t load the entire JSON into memory but stream that out to save some RAM. Get it working first. We will open up another issue later if performance is an issue
  11. Whats the definition of done other than code completion and functionality completion?
  12. Unit test coverage % ?
  13. Integration test ?
  14. Java API Documentation?

I would like to see an integration test like LoadDocumentsFromDirectoryTest.

sanjuthomas commented 8 years ago

about the acceptance criteria -

" input_file_path = the directory containing the three json docs input_file_path = a value which would exclude out 1 of the 3 json documents "

the second key is input_file_path or input_file_pattern?

sanjuthomas commented 8 years ago

Do we have a any standard code formatter xml file for eclipse? PMD or Findbug rules?

rjrudin commented 8 years ago

I recommend .editorconfig - here's a simple one - https://github.com/rjrudin/slush-marklogic-spring-boot/blob/dev/templates/.editorconfig

I know Eclipse has a plugin for it. But it's already integrated with Intellij.

sanjuthomas commented 8 years ago

@rjrudin Thanks; I can use Intellij.

Do you know why many test cases are failing on standalone mode? (when ran independently from IDE?)

image

sastafford commented 8 years ago

I am using a updated version of the Java Client API that has a bug fix. It should have a priority over the older java client api jar in the classpath.

https://github.com/sastafford/marklogic-spring-batch/wiki/Troubleshooting

sanjuthomas commented 8 years ago

@sastafford Please see the pull request - https://github.com/sastafford/marklogic-spring-batch/pull/62

sanjuthomas commented 8 years ago

@sastafford new pull request is at https://github.com/sastafford/marklogic-spring-batch/pull/64