Closed Habeeb556 closed 2 months ago
Please provide a minimal working example to reproduce this. Not just a line of code w/o any context.
@ThiefMaster here is the application log:
Fetching CSV from results backend [29c6731c-f1d2-4636-9e1a-80889fac9d55]
2024-07-09 12:25:32,657:INFO:superset.commands.sql_lab.export:Fetching CSV from results backend [29c6731c-f1d2-4636-9e1a-80889fac9d55]
Decompressing
2024-07-09 12:25:32,659:INFO:superset.commands.sql_lab.export:Decompressing
Using pandas to convert to CSV
2024-07-09 12:25:32,916:INFO:superset.commands.sql_lab.export:Using pandas to convert to CSV
And the export.py
file that converts the data to CSV.
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
from __future__ import annotations
import logging
from typing import Any, cast, TypedDict
import pandas as pd
from flask_babel import gettext as __
from superset import app, db, results_backend, results_backend_use_msgpack
from superset.commands.base import BaseCommand
from superset.errors import ErrorLevel, SupersetError, SupersetErrorType
from superset.exceptions import SupersetErrorException, SupersetSecurityException
from superset.models.sql_lab import Query
from superset.sql_parse import ParsedQuery
from superset.sqllab.limiting_factor import LimitingFactor
from superset.utils import core as utils, csv
from superset.views.utils import _deserialize_results_payload
config = app.config
logger = logging.getLogger(__name__)
class SqlExportResult(TypedDict):
query: Query
count: int
data: list[Any]
class SqlResultExportCommand(BaseCommand):
_client_id: str
_query: Query
def __init__(
self,
client_id: str,
) -> None:
self._client_id = client_id
def validate(self) -> None:
self._query = (
db.session.query(Query).filter_by(client_id=self._client_id).one_or_none()
)
if self._query is None:
raise SupersetErrorException(
SupersetError(
message=__(
"The query associated with these results could not be found. "
"You need to re-run the original query."
),
error_type=SupersetErrorType.RESULTS_BACKEND_ERROR,
level=ErrorLevel.ERROR,
),
status=404,
)
try:
self._query.raise_for_access()
except SupersetSecurityException as ex:
raise SupersetErrorException(
SupersetError(
message=__("Cannot access the query"),
error_type=SupersetErrorType.QUERY_SECURITY_ACCESS_ERROR,
level=ErrorLevel.ERROR,
),
status=403,
) from ex
def run(
self,
) -> SqlExportResult:
self.validate()
blob = None
if results_backend and self._query.results_key:
logger.info(
"Fetching CSV from results backend [%s]", self._query.results_key
)
blob = results_backend.get(self._query.results_key)
if blob:
logger.info("Decompressing")
payload = utils.zlib_decompress(
blob, decode=not results_backend_use_msgpack
)
obj = _deserialize_results_payload(
payload, self._query, cast(bool, results_backend_use_msgpack)
)
df = pd.DataFrame(
data=obj["data"],
dtype=object,
columns=[c["name"] for c in obj["columns"]],
)
logger.info("Using pandas to convert to CSV")
else:
logger.info("Running a query to turn into CSV")
if self._query.select_sql:
sql = self._query.select_sql
limit = None
else:
sql = self._query.executed_sql
limit = ParsedQuery(
sql,
engine=self._query.database.db_engine_spec.engine,
).limit
if limit is not None and self._query.limiting_factor in {
LimitingFactor.QUERY,
LimitingFactor.DROPDOWN,
LimitingFactor.QUERY_AND_DROPDOWN,
}:
# remove extra row from `increased_limit`
limit -= 1
df = self._query.database.get_df(sql, self._query.schema)[:limit]
csv_data = csv.df_to_escaped_csv(df, index=False, **config["CSV_EXPORT"])
return {
"query": self._query,
"count": len(df.index),
"data": csv_data,
}
The data appeared when exported CSV like this in version 3.x
:
However in version 2.x
, it looks like this:
We need a minimal reproducible example demonstrating that the issue is in Werkzeug. We can't debug Apache Superset or this complicated example. This also looks like an encoding issue either during writing the file or opening in Excel, which is not handled by Werkzeug.
We need a minimal reproducible example demonstrating that the issue is in Werkzeug
+1, the current issue with Werkzeug refers to Superset-specific code and they won't be able to repro/fix. Though we may get lucky and they might point to some 3.x breaking change they knew might cause this kind of issue. In any case, it'd be good to get to the specifics of where the behavior is different from what you'd expect in Werkzeug itself.
This is strange. To reproduce the issue, I did the following:
pip install Werkzeug==3.0.3
.pip install Werkzeug==2.3.8
.Right, though the "minimal reproducible example" would figure out exactly what in exact method/feature we're using from Werkzeug that has changed behavior over those versions, and remove all the related Superset-specific logic. Without knowing what cause the change of behavior, it's hard to even assert that the 2.3.8
version is the correct behavior. It may happen to be the desired behavior for you in your particular use case, but doesn't mean it's right.
Download to CSV shows a special Chinese characters
This bug is related to Apache Superset. When I try to download a CSV query containing non-English characters, I get special Chinese characters as shown in the attached example: https://github.com/apache/superset/pull/29506. Even setting the export configuration as follows:
This behavior didn't exist in Werkzeug
version 2.3.8
, but starting fromversion 3.x
we encountered this error.Environment: