magento / community-features

Magento Features Development is an Initiative to Allows Community Memebers Join to Development of Magento Features
46 stars 18 forks source link

allow serializer to serialize / deserialize objects #262

Open ioweb-gr opened 4 years ago

ioweb-gr commented 4 years ago

I've been digging into the code of Magento 2 in an attempt to serialize my Models for general use like

When it comes to JSON, currently the \Magento\Framework\Serialize\SerializerInterface is a very simple call to json_encode with an exception and nothing more.

I believe this is not enough for modern frameworks and it's very important to have the ability to serialize a Class Object and be able to deserialize a JSON string to a Class Object. This would effectively let us serialize Models, Collections, SearchResults, etc etc in a very simple way, convert them back and use all the available functions they have.

Symfony already does this since years ago with the serializer component https://symfony.com/doc/current/components/serializer.html and it's very practical.

It can

Now the underlying Magento structure is still using the getData and setData methods and not class properties thus making it a bit more difficult to implement so I don't think symfony's serializer would be easy to integrate as we'd need to rewrite various functions to make it work, however, would it be possible to extend the SerializerInterface to utilize the functions toJson and convertToJson from the \Magento\Framework\DataObject

Then we could serialize Models and Collections by invoking that underlying method in one command and deserialize back to it. Deserialization would need an additional Interface which would allow it to to deserialize the data into a specific Model or Collection.

I'd love to hear your thoughts about this.

thomas-kl1 commented 4 years ago

Take a look on hydrators and data processor in the Magento framework, it should be a good start :)

ioweb-gr commented 4 years ago

I've seen the HydratorInterface but it seems to me it's just converting the Entity Object's data to array and Adding data to an Object Entity. It's quite different to the serializer component. I thought that Hydration usually refers to how an item is retrieved from the Database Layer and converted to a Model, for example to retrieve the results as array or object or any other custom type after a DB Query brings the results. I was talking about converting an already fetched object model or collection or mixed data. Maybe I'm misunderstanding something.

I didn't notice the DataProcessor so I'm going to check how it works to see if that can provide the needed functionality though.

IgorVitol commented 4 years ago

@ioweb-gr Correct. Serializing object "as-is" is not secure comparing to serializing data. Some time ago magento models/providers were refactored to serialize data, instead of objects.

ioweb-gr commented 4 years ago

I forgot to mention, I've already seen a few people in third party modules go through a (simplified) loop like this to send unsafe data for their ajax functions.

$data = []
foreach($items as $item){
  $data[] = $item->getData()
}
$jsonResult->setData($data);
return $jsonResult;

It would be safer imo to have a serializer component take the $items collection and serialize it securely.

IgorVitol commented 4 years ago

@ioweb-gr It looks like you are trying to call this methods on controller (e.g. old M1 style). I would recommend you to use WebAPI instead. It will do all the serialization stuff for you based on interface (e.g. public get methods). https://devdocs.magento.com/guides/v2.3/extension-dev-guide/service-contracts/service-to-web-service.html

For example, you can check "estimate-shipping-methods" calls during filling customer address:

Now, just check how it configured:

Based on this example, you don't need to do any to/from json convertation - this will be done for you, based on interface & it's public methods return types. Note that some methods have optional return types, like - "string|null". In this case, if value doesn't exists - it will not be included in result json.

Note: Return types could be other interfaces, e.g. nesting objects are supported too.

ioweb-gr commented 4 years ago

@IgorVitol No it was just an example for some security concerns too, from stuff I see lurking in third party modules to showcase one area where it could help.

In my particular case I wanted to render a js Component in a block and pass some initialization data to it like

So in my phtml file I want to have this as per DevDocs documentation

<script type="text/x-magento-init">
{
    "*": {
        "vendormodule_init": <?= $block->getConfigJson() ?>
    }
}
</script>

It's also integrating a custom JS library where configuration data needs to be specifically formatted too. I don't think using the rest api for this and pull the data from there when it's not dynamic enough to need doing it via ajax requests, is the correct way. For such a simple task, it would require exposing many new webapi urls and also need mutliple requests slowing down the site rendering unreasonably.

So my use case was much simpler. Fetch some data in my block, serialize them to json, print them to the template. But to do that I'd much rather serialize everything with a dedicated serialization mechanism without leaving the security worries to each developer, but to the framework.

I can see however cases where the rest API could present different challenges. In your example, in the Interface you declare all public methods. Then you enable them in the webapi through the xml configuration and you don't have any further control as to what information is going to be rendered in the response. All the public methods will be available as data in the response.

So for example if I needed just two fields from the product data, product name and image, to create a simple slider via ajax, how would I achieve it in the webapi way? Would I have to fetch all the data first and then trim them down to what I need? This is overkill, isn't it?

I'm comparing it to how easy it would be on symfony to do it. Just annotate the entity properties with a new Group and then serialize using that group. Instant result with the needed data using the serializer component.

My thought about deserializing was for synchronization between 2 magento installations but it can work in different application areas too. Either via webapi or via legacy way, I have to fetch data from a different Magento 2 installation, in json format, but I have to manually code how to turn them back into Magento 2 objects. Only then can I manipulate them and do whatever I need to do.

Deserialization would make this task a oneliner (or close to it) without having to worry about a lot of details.

One more point, looking at your screenshot.

According to OWASP https://cheatsheetseries.owasp.org/cheatsheets/AJAX_Security_Cheat_Sheet.html#always-return-json-with-an-object-on-the-outside

The example you mentioned according to OWASP is a security concern.

image

I'm not into penetration testing but I regularly take a look at the cheatsheet to make sure I'm not writing something vulnerable. You might want to read further into why returning an encompassing array is an issue.

https://haacked.com/archive/2008/11/20/anatomy-of-a-subtle-json-vulnerability.aspx/ (outdated but you never know)

IgorVitol commented 4 years ago

In my particular case I wanted to render a js Component in a block

Ah, ok, now it's clear.

For such a simple task, it would require exposing many new webapi urls and also need mutliple requests slowing down the site rendering unreasonably.

Yes, you are right.

All the public methods will be available as data in the response.

Only getter methods (which doesn't require any arguments) will be present. Keep in mind that data is processed by data interfaces, not entity interface. Only data covered by data interface will be rendered.

Note: As an alternative, take a look at PWA & GraphQL implementation. This might be interesting for you. Example: https://venia.magento.com/

E.g. by sending GraphQL request to Magento backend, you could decide what data from what object (or set of objects) you want to fetch.

but I have to manually code how to turn them back into Magento 2 objects

Ok, I got your point. But that's how it works... you need to operate with data, not entity. Data models, models, resource models - all of this has different responsibility.

It's hard to predict on how to handle different types of objects and their classes using common solution:

You could take a look how it's implemented in WebAPI:

Another example:

As you can see, magento team tried to solve your idea, but keeping strict, documented types of all data & it's keys.

This hovewer looks too complex, when you need just few values passed to block configuration, I guess that's the case for your own serializer class/method.

The example you mentioned according to OWASP is a security concern

This screenshot just demonstrate json encoding results according to configured interface in webapi.xml. This particular JSON will not be used for deserialization, so it's fine.

ioweb-gr commented 4 years ago

Thanks for explaining further. I really find your answer very helpful to explore more things in the framework. I haven't experimented yet with Venia but currently I'm not sure it's the best choice for our customers. Not many businesses in our country will devote the time and money for building a PWA at this period. We're usually as a country a few years back compared to more advanced countries and under an economic crisis. It's not because of a lack of good developers but because of bad business choices. They don't wish to dive so deep when they can turn out a good profit using woocommerce if you know what I mean. The best framework and the best app type is not always the most suitable for everyone's needs. Smaller market, smaller needs. Bigger clients, bigger needs. Judging from how slow the checkout is on cheaper smartphones, the whole site would crawl in like 30% of our country's users with a PWA.

It's hard to predict on how to handle different types of objects and their classes using common solution:

This is actually the whole issue why I'm bringing it up. It's not hard to serialize in my particular case and needs. However, in Magento 2 we have so many types of classes and objects that it makes it a very cumbersome task to serialize all the needed data for every particular scenario and there's no generalized, secure way of doing it. For example in order to create a custom EAV CRUD Entity to store translateable data and expose it to the API you need 49 files including di.xml and acl and everything else to define it. Not counting factories and automatically generated classes that will ultimately get created by the framework but leave you without autocompletion in the IDE.

There's no recommended or suggested way from the framework for a common task used in many situations. Makes refactoring M1 code much more difficult too. Keep in mind that's just for serializing in JSON. Symfony's serializer can serialize as json, yaml, xml by default but csv and whatever else you might need is easily extendable. If you take a look at what people are doing when dealing with XML data in magento 2 to export product feeds, it will bring you to tears.

All properties usually (should be) private and accessed only by public getter methods.

Depends on the architecture but yes. At least protected level, and symfony serializer adheres to this logic. If you annotate a class and don't provide the getter function, you won't receive data from a protected or private property. They're not mutually exclusive.

How you would save serialized class name? You can not save it with class data, since this actually a security issue.

This is not saved in the json data. Deserializing takes a parameter to which class to deserialize it to and for all correctly annotated classes, just throwing in the class name is enough. If you don't have it correctly annotated you have to write your own decoder that extends the Abstract Decoder class. The target class may be a different class than the original one but still compatible. For example, when I retrieve a json response from a remote API I usually create a class that describes the response to have autocompletion and consistent behaviour. It also makes writing mock response, validation code and tests faster. I have to do all the work manually in Magento 1/2 because of a lack of a general serializer / deserializer. I can't just define my class and annotate, then have the serializer handle it for me in a secure way. Relevant functionality https://symfony.com/doc/current/components/serializer.html#deserializing-in-an-existing-object

To be honest I never particularly liked the data array in Magento 2 because it messed with my IDE's autocompletion. From the data array stems the reason why different areas use different names for the same properties, like when using setup to install different attributes but I can see why it was needed as a dynamic, extendable structure. Interfaces compensate for that in Magento 2 to some extend.

I'm especially bringing up symfony, because Drupal (competitor in headless installations, actually I think it's number 1 in this aspect worldwide) and Prestashop a direct ecommerce competitor have long since migrated to a symfony base and already see those benefits.

As you can see, magento team tried to solve your idea, but keeping strict, documented types of all data & it's keys. This hovewer looks too complex, when you need just few values passed to block configuration,

Again this was my particular case which drove me to write this thread, but is Magento 2 only supposed to serve complex scenarios and API-centric apps and focus on those? Should it neglect simpler tasks and have developers reinvent the wheel? I'm not trying to say that one framework is better than the other, just pointing out the good stuff on each one to maybe include some of them and improve further to make our lives easier, not harder.

I guess that's the case for your own serializer class/method.

That's what I'm trying to avoid. Having each developer writing his own serializer class / method without a guidline to make sure it's secure and focus on a reusable interface. In that case, when serializing you'll get

The interface already exists, the SerializerInterface, it's just very limited in scope and would love to see it extended.