Closed kornysietsma closed 7 years ago
Thanks for reporting this issue. If you have the time, please submit a fix.
Actually, on writing some tests for an upcoming patch, I found that select-columns
actually mostly works - it does skip blank rows/cells, but as it uses .getColumnIndex
to work out column mappings, it assigns the correct column labels anyway.
What is the preferred behaviour for select-columns
on a blank row/column?
With the Clojure semantics of maps, empty map for a blank row would have the same result as the latter solution in normal use cases, so I would support your recommendation, blank row = empty map.
PS: please note that the repo has moved, the old master still works but for good measure consider changing your master to github.com/mjul/docjure (was: github.com/ative/docjure)
For all those with this problem, as a quick fix you can just use with-redefs
without changing dependencies or cloning the repo:
(with-redefs [dk.ative.docjure.spreadsheet/row-seq (fn [^Sheet sheet]
(map #(.getRow sheet %) (range 0 (inc (.getLastRowNum sheet)))))]
;;load your excel file here
)
(:import org.apache.poi.ss.usermodel.Sheet)
in your ns declarationJust to check if the project is still alive. Ability to read sparse tables correctly is kind of a big deal and at least my experiments with this PR show it's legit and does the right thing. Be great to have it merged.
Big kudos to @kornysietsma for the fix! Big thanks to creators and maintainers!
Note: with this fix rows of different lengths may require padding at the end. Not a big deal and probably not something you want by default but might trip someone. Only matters in cases where you deal with valid tables with some rows without data in their tail-cells.
@vkz I agree. If someone would update the pull req (#22) from @kornysietsma to match the latest version and add some example spreadsheets with missing rows and cells to the test suite, I would be happy to merge it.
I'll try to update it - hopefully it shouldn't take long, I just need prodding :-)
On 11 Aug 2016 8:52 a.m., "Martin Jul" notifications@github.com wrote:
@vkz https://github.com/vkz I agree. If someone would update the pull req (#22 https://github.com/mjul/docjure/pull/22) from @kornysietsma https://github.com/kornysietsma to match the latest version and add some example spreadsheets with missing rows and cells to the test suite, I would be happy to merge it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mjul/docjure/issues/19#issuecomment-239094884, or mute the thread https://github.com/notifications/unsubscribe-auth/AABFZTogQdEj-I-pHeWd68GJR-_R5HdKks5qetS3gaJpZM4Ba29W .
I've been looking again at this patch, and I realised why I didn't get back to it - it needs some decisions on how best to implement it, and my preferred option may be a breaking change for some clients.
The problem is, ideally I'd want to just replace the existing row-seq and cell-seq with the dense functions, so they always return nil for missing rows / cells. However, this could easily be a breaking change if there are clients that don't expect nil return values (for example, the docjure 'set-row-style!' function would need to be modified to ignore nil cells)
Would this be OK?
Alternatively, you could make this optional - pass an extra missing-cell-policy flag, or set a local binding like missing-cell-policy that specifies which approach to use, and add a macro like (with-missing-cell-policy :return-nil ...) - but this might be needless complication.
So, would you prefer:
On 11 August 2016 at 12:39, Kornelis Sietsma kornys@gmail.com wrote:
I'll try to update it - hopefully it shouldn't take long, I just need prodding :-)
On 11 Aug 2016 8:52 a.m., "Martin Jul" notifications@github.com wrote:
@vkz https://github.com/vkz I agree. If someone would update the pull req (#22 https://github.com/mjul/docjure/pull/22) from @kornysietsma https://github.com/kornysietsma to match the latest version and add some example spreadsheets with missing rows and cells to the test suite, I would be happy to merge it.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mjul/docjure/issues/19#issuecomment-239094884, or mute the thread https://github.com/notifications/unsubscribe-auth/AABFZTogQdEj-I-pHeWd68GJR-_R5HdKks5qetS3gaJpZM4Ba29W .
Kornelis Sietsma korny at my surname dot com http://korny.info .fnord { display: none !important; }
Hey - trying to close this off... :)
In the absence of responses to my previous over-long essay, I'm going to go with the slightly messy option of letting people choose which way docjure will work via a dynamic var and a wrapper macro.
That seems to be the only way to preserve current behaviour, without duplicating a whole bunch of functions, or adding extra parameters all over the place
So if you wrap code as follows:
(with-missing-values-as-nil (for [row (row-seq sheet) cell (cell-seq row)] ... )
then missing rows and cells will be returned as nil
I'm cleaning up the tests and docs, will probably get a pull request in soon.
Hi @kornysietsma I think your initial initial suggestion of always returning nils for blank rows makes more sense. That it is not backwards compatible is not a problem: it is a bug-fix, after all. Thanks for helping out. Cheers, Martin
Not my decision but I'd also prefer option 2 - returning nil for blank rows/cells. The reason is that removing nils from the dense version is easy with standard clojure functionality. Adding nils to the non-dense version at the right places is impossible.
So the former is more general one but still requires only few workarounds to make it work as before.
Probably the changelog should contain the word BREAKING CHANGE and you might want to increase the major version.
Cool - I'll go down that route then, it'll keep the code and the tests much simpler.
On 6 September 2016 at 08:54, Daniel Michulke notifications@github.com wrote:
Not my decision but I'd also prefer option 2 - returning nil for blank rows/cells. The reason is that removing nils from the dense version is easy with standard clojure functionality Adding nils to the non-dense version at the right places is impossible.
So the former is more general one but still requires only few workarounds to make it works a before.
Probably the changelog should contain the word BREAKING CHANGE and you might want to increase the major version.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mjul/docjure/issues/19#issuecomment-244876316, or mute the thread https://github.com/notifications/unsubscribe-auth/AABFZT97_KoVhiMUisrtzX3XhrwBq9yhks5qnRwhgaJpZM4Ba29W .
Kornelis Sietsma korny at my surname dot com http://korny.info .fnord { display: none !important; }
I've updated the pull request #22
@mjul: Thanks for a great tool. All the others here too, esp. also @kornysietsma.
On topic: I think returning nil always for blank rows and cells is the right decision. It will make row-seq and cell-seq much more dependable.
The fix for this issue from @kornysietsma has been included in version 1.11.0. Thanks for helping make the library better.
@kinleyd Thanks for the kind words. I was very happy to see that you are using it in Bhutan as it is one of my favourite countries. If you have any ideas or suggestions, don't hesitate to contribute to the project.
@dmichulke Thanks for the thoughtful feedback. I decided not to upgrade the major version as I consider this a bug-fix (blank lines being skipped was a thing leaking through from the underlying POI library). Please keep your suggestions and pull requests coming.
@vkz This has now been fixed, so please give it a whirl and submit feedback, suggestions and pull requests to make it even better.
@kornysietsma Thanks a lot for helping out with this. It is really good work. Please send more pull requests in the future.
@mjul: Nice to know you like Bhutan! docjure is a great library and I look forward to making some contributions where I can. Keep up the good work.
On Mon, Sep 19, 2016 at 7:22 PM Martin Jul notifications@github.com wrote:
@kornysietsma https://github.com/kornysietsma Thanks a lot for helping out with this. It is really good work. Please send more pull requests in the future.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mjul/docjure/issues/19#issuecomment-247991174, or mute the thread https://github.com/notifications/unsubscribe-auth/ABWrvyNOCM-1J13C485_dTMtSG_OQO2Yks5qrox2gaJpZM4Ba29W .
This is basically the same as the issue my colleague nurous raised against clj-excel: https://github.com/mebaran/clj-excel/pull/5
The root problem is that iterators in POI skips blank rows and cells. As a result, a blank row won't be returned in a
row-seq
call, and a blank cell won't be returned in acell-seq
call. Worse, when running functions likeselect-columns
the results can be quite wrong due to these problems.(note that this is for truly blank cells - a spreadsheet seems to be implemented somewhere as a sparse array. Cells where you've entered data previously, and then deleted the data, may still be returned by these functions)