Closed tbsvttr closed 1 year ago
well, that's because CSV is a flat format, it doesn't support "nesting", you can work around it by using methods to extract/generate data, that way you have full control. Here's an example:
class Address < Shale::Mapper
attribute :street, Shale::Type::String
attribute :city, Shale::Type::String
end
class Person < Shale::Mapper
attribute :name, Shale::Type::String
attribute :address, Address, default: -> { Address.new }
csv do
map 'name', to: :name
map 'street', using: { from: :street_from_csv, to: :street_to_csv }
map 'city', using: { from: :city_from_csv, to: :city_to_csv }
end
def street_from_csv(model, value)
model.address.street = value
end
def street_to_csv(model, doc)
doc['street'] = model.address.street
end
def city_from_csv(model, value)
model.address.city = value
end
def city_to_csv(model, doc)
doc['city'] = model.address.city
end
end
person = Person.from_csv(<<~DATA)
John Doe,Oxford Street,London
DATA
# => #<Person:0x2a06a
# @name="John Doe"
# @city="London",
# @street="Oxford Street">
person.to_csv
# => "John Doe,Oxford Street,London"
Thanks for your quick reply! I just edited my initial post again to improve upon it and showing that there is a bit of a problem with nil values in CSV, too.
well, that's because CSV is a flat format, it doesn't support "nesting",
That is absolutely correct! However, shouldn't default behaviour create a somewhat acceptable and unsurprising result? Just having a JSON string which is created with different rules as the actual to_json
result seems kinda wired to me. Why not flatten the structure as I suggested?
you can work around it by using methods to extract/generate data, that way you have full control. Here's an example:
This would actually solve the problem and would be 'good enough'. However, it would create a lot of boilerplate for a lot of people, I guess.
Thanks for your quick reply! I just edited my initial post again to improve upon it and showing that there is a bit of a problem with nil values in CSV, too.
That actually is documented (https://github.com/kgiszczak/shale#rendering-nil-values). CSV renders nil values by default (as opposed to other formats), because without it, it would mangle the CSV output. render_nil: false
means don't include the field/column at all in the output (so in the case of CSV it would mean one comma less).
E.g. if your first_name = nil
and last_name = 'Doe'
you would get:
first_name,last_name
Doe
If you really need, you can overwrite this behaviour with explicitly setting render_nil: false
- but in the case of CSV you would need that in a very rare circumstances.
That is absolutely correct! However, shouldn't default behaviour create a somewhat acceptable and unsurprising result? Just having a JSON string which is created with different rules as the actual to_json result seems kinda wired to me.
That actually is not a JSON but stringified Ruby Hash, as all formats except XML uses this data structure internally. Underneath to_csv
is calling as_csv
(or as_json
, as_yaml
and so on) method that returns this hash and then converts it to CSV. E.g.
person.as_csv
# => { 'first_name' => 'John', `last_name` => 'Doe' }
Why not flatten the structure as I suggested?
However, it would create a lot of boilerplate for a lot of people, I guess
I totally get it, ang it's not the first time this was raised. I tried to implement a solution for this problem, but it opend a can with worms with so much edge cases that I gave up. I might try to approch this problem again in the future, but there's no a quick solution for this.
I totally get it, ang it's not the first time this was raised. I tried to implement a solution for this problem, but it opend a can with worms with so much edge cases that I gave up. I might try to approch this problem again in the future, but there's no a quick solution for this.
Thanks for your understanding. Can I find your approach back then somewhere? Maybe we can improve upon it, or maybe it is even enough for our use case.
That actually is not a JSON but stringified Ruby Hash, as all formats except XML uses this data structure internally. Underneath to_csv is calling as_csv (or as_json, as_yaml and so on) method that returns this hash and then converts it to CSV.
Ah, I see. So this is some kind of fallback in case of “nested CSV”.
If you really need, you can overwrite this behaviour with explicitly setting render_nil: false - but in the case of CSV you would need that in a very rare circumstances.
You are absolutely right. Not rendering the nil values in a CSV would break the CSV in most cases. I actually suggested this behaviour for the representation of nested data within a CSV. My comment about it showing nil was about that stringified Hash you use internally (which I misunderstood for some wired JSON).
That actually is documented (https://github.com/kgiszczak/shale#rendering-nil-values). CSV renders nil values by default (as opposed to other formats), because without it, it would mangle the CSV output. render_nil: false means don't include the field/column at all in the output (so in the case of CSV it would mean one comma less).
You are right here, too. However, I feel it is a bit inconsistent to have CSV so different behaviour than anything else. Why not make render_nil: true
the default behaviour?
Thanks for your understanding. Can I find your approach back then somewhere? Maybe we can improve upon it, or maybe it is even enough for our use case.
I made pretty good progress on this and actually came up with a solution that I like and works really well. It won't be production ready for at least couple of weeks, but I can push a branch with it for you to try out and hear your feedback on it. I'll do it tomorrow.
Ah, I see. So this is some kind of fallback in case of “nested CSV”.
Not really a fallback, more like a side effect of the underlying CSV parser.
You are right here, too. However, I feel it is a bit inconsistent to have CSV so different behaviour than anything else. Why not make render_nil: true the default behaviour?
Because CSV is much different than JSON, YAML, TOML or XML - in these formats you usually don't want to render empty fields/nodes, in CSV you have to, otherwise you mangle the output.
I made pretty good progress on this and actually came up with a solution that I like and works really well. It won't be production ready for at least couple of weeks, but I can push a branch with it for you to try out and hear your feedback on it. I'll do it tomorrow.
Thanks, that would be very good!
@kgiszczak Just a short reminder. :)
sorry for the delay, I had a lot of work past few days, but I'm almost done, I need one more day to finish tests and write documentation, it'll be ready after the weekend.
Hey @tbsvttr the feature is ready https://github.com/kgiszczak/shale#delegating-fields-to-child-attributes
@kgiszczak Hey, that is very cool! As far as I can see, this solves the problem with relatively little boilerplate code.
Can you release this as v0.10?
I will make a release in a few weeks
Hi! It's me again!
As already mentioned in the other issue, I am very interested in the CSV serialization capabilities of this (cool!) library.
However, it seems that the CSV serialization is not working in a very useful manner when data is nested.
As an example, given this setup:
These are the results:
Note how the JSON and XML can sensible handle the nested data structure, but the CSV now actually contains a JSON string in a field. My guess is that this will not be acceptable for most scenarios where CSV is actually used. Also note that in this JSON string, the nil values are rendered, while they are not rendered anywhere else. No
render_nil
was used anywhere. This lets me guess that this is somewhat of an unhandled or unintended behaviour.My suggestion would be to flatten the structure so that it will be compatible with CSV. Something like this:
Or with
render_nil: true
:Note that it render an empty string into each field, so that the table which is represented in CSV does maintain is coherence for each row.
With the following headers:
What do you think? Or did I overlook some feature of the library?