jeffgortmaker / pyblp

BLP Demand Estimation with Python
https://pyblp.readthedocs.io
MIT License
228 stars 82 forks source link

Question: Most Efficient Way of Producing LaTeX Tables of PyBLP Results and Estimates #158

Closed mcket747econ closed 1 month ago

mcket747econ commented 1 month ago

Hello,

I am very new to PyBLP and I have been trying to figure out the best way to export the parameter estimates, standard errors, and other relevant statistics into LaTeX tables. While there are ways to export python dataframes using .to_latex() there does not appear to be a way to export the GMM parameter estimates smoothly. My current solution is to use pyblp within R, and then do some wrangling to get estimates into Stargazer. Do you know of a more efficient approach to produce tables such as those numbered 7 and 8 in Conlon and Gortmaker(2020): image

Any help is appreciated!

My current R/Stargazer approach is the following


  product_formulations = tuple(
    pyblp$Formulation('0 + prices + AVERAGE_COUNT_ROOMS',absorb='C(product_ids)'),# Linear demand
    pyblp$Formulation('1 + prices + AVERAGE_COUNT_ROOMS')),
  agent_formulation = pyblp$Formulation('INCOME_UNDER_US_25_000 + INCOME_US_25_000_US_59_999 \
+ INCOME_US_60_000_US_99_999 + INCOME_US_100_000_AND_OVER + JAPAN + MIDWEST + NORTHEAST + ONTARIO_QUEBEC'),
  product_data = read_csv("Creating_BLP_Dataset/Output/df_prod2.csv"),
  agent_data = read_csv("Creating_BLP_Dataset/Output/df_demo1.csv")
)

regressors <- c("Price","Average_Count_Rooms","PricexIncome_Under_25000","PricexIncome_25000_59999","PrixexIncome_US60000_US99999","PricexIncome_Over_US100000",
                "Japan", "Midwest","Northwest","Ontario_or_Quebec")
bta = as.data.frame(matrix(rnorm(10 * 11), nc = 11))
bta = as.data.frame(results$beta)
names(bta) <- c("estimates",regressors)
f <- as.formula(paste("estimates", "~ 0 +", paste(regressors, collapse = "+")))
p <-lm(f,bta)

xyz = as.vector(results$beta_se)
xyzl= as.vector(results$beta)
stargazer(p, type = "latex", 
          coef = list(xyzl),
          se = list(xyz),
          t = list(xyzl / xyz),
          float=FALSE,
          omit.stat = "all",out ="path.tex")```
jeffgortmaker commented 1 month ago

Unfortunately I won't be of much help here! Formatting results is beyond the scope of PyBLP.

In my own work (including for the above screenshot) I tend to just build tables custom instead of relying on other packages, which I've found don't really afford enough flexibility (and I always forget how to use their interfaces).

No idea if this is helpful, but here's a copy/paste of a quick class I threw together, iterations of which I've used for making custom tables with Python that are in the style I like.

class TableFormatter:
    """LaTeX table formatter."""

    def __init__(self, subtables=1, longtable=False, longtable_continue_start=0):
        """Initialize empty cells."""
        assert not (subtables > 1 and longtable)
        self.subtables = subtables
        self.longtable = longtable
        self._longtable_continue_start = longtable_continue_start
        self._header_cells = [[] for _ in range(self.subtables)]
        self._data_cells = [[] for _ in range(self.subtables)]
        self._header_extras = [{} for _ in range(self.subtables)]
        self._data_extras = [{} for _ in range(self.subtables)]

    def add_header_row(self, values, row=None, subtable=0):
        """Set a row in a header."""
        self._add_row(values, row, subtable, header=True)

    def add_data_row(self, values, row=None, subtable=0):
        """Set a row."""
        self._add_row(values, row, subtable, header=False)

    def add_header_line(self, row=None, subtable=0):
        """Add a midrule in a header."""
        self._insert_after_row(r'\midrule', row, subtable, header=True)

    def add_data_line(self, row=None, subtable=0):
        """Add a midrule."""
        self._insert_after_row(r'\midrule', row, subtable, header=False)

    def add_header_space(self, row=None, subtable=0):
        """Add vertical space in a header."""
        self._insert_after_row(r'\addlinespace', row, subtable, header=True)

    def add_data_space(self, row=None, subtable=0):
        """Add vertical space."""
        self._insert_after_row(r'\addlinespace', row, subtable, header=False)

    def add_header_page_break(self, row=None, subtable=0):
        """Add a page break in a header."""
        self._insert_after_row(r'\pagebreak', row, subtable, header=True)

    def add_data_page_break(self, row=None, subtable=0):
        """Add vertical space."""
        self._insert_after_row(r'\pagebreak', row, subtable, header=False)

    def add_header_underline(self, specs, row=None, subtable=0):
        """Add underlines in a header."""
        value = ''.join(r'\cmidrule(%s){%s-%s}' % s for s in specs)
        self._insert_after_row(value, row, subtable, header=True)

    def add_data_underline(self, specs, row=None, subtable=0):
        """Add underlines."""
        value = ''.join(r'\cmidrule(%s){%s-%s}' % s for s in specs)
        self._insert_after_row(value, row, subtable, header=False)

    # noinspection PyTypeChecker
    def _add_row(self, values, row, subtable, header):
        """Add a row."""
        cells = (self._header_cells if header else self._data_cells)[subtable]

        if row is None:
            row = len(cells)

        for column, value in enumerate(values):
            # extend the cells if necessary
            while len(cells) < row + 1:
                cells.append([])
            while len(cells[row]) < column + 1:
                cells[row].append('')

            # identify multicolumns
            columns = 1
            alignment = None
            if isinstance(value, (list, tuple)) and isinstance(value[0], int):
                columns, alignment, value = value

            # join multivalued values into a stack
            if isinstance(value, (list, tuple)):
                value = '\\shortstack{%s}' % ' \\\\ '.join(str(v) for v in value)
            else:
                value = str(value)

            # construct multicolumns
            if columns > 1:
                value = r'\multicolumn{%s}{%s}{%s}' % (columns, alignment, value)

            cells[row][column] = value

    def _insert_after_row(self, value, row, subtable, header):
        """Add extra content after a row."""
        cells = (self._header_cells if header else self._data_cells)[subtable]
        extras = self._header_extras if header else self._data_extras

        if row is None:
            row = len(cells) - 1

        extras[subtable][row] = value

    def __str__(self):
        """Format the table."""
        formatted = [r'\toprule']
        if self.longtable:
            for first_head in [True, False]:
                if not first_head:
                    formatted.extend([
                        r'\midrule',
                        r'\multicolumn{%s}{c}{Continued from the previous page.} \\' % len(self._data_cells[0][0]),
                        r'\midrule'
                    ])

                for row, values in enumerate(self._header_cells[0]):
                    if not first_head and row < self._longtable_continue_start:
                        continue
                    formatted.append('%s \\\\' % ' & '.join(values))
                    if row in self._header_extras[0]:
                        formatted.append(self._header_extras[0][row])

                formatted.append(r'\midrule')
                formatted.append(r'\endfirsthead' if first_head else r'\endhead')

            formatted.extend([
                r'\midrule',
                r'\multicolumn{%s}{c}{Continued on the next page.} \\' % len(self._data_cells[0][0]),
                r'\midrule',
                r'\endfoot',
                r'\endlastfoot'
            ])
            for row, values in enumerate(self._data_cells[0]):
                formatted.append(r'%s \\' % ' & '.join(values))
                if row in self._data_extras[0]:
                    formatted.append(self._data_extras[0][row])

            formatted.append(r'\bottomrule')
        else:
            for subtable in range(self.subtables):
                for row, values in enumerate(self._header_cells[subtable]):
                    formatted.append(r'%s \\' % ' & '.join(values))
                    if row in self._header_extras[subtable]:
                        formatted.append(self._header_extras[subtable][row])

                formatted.append(r'\midrule')

                for row, values in enumerate(self._data_cells[subtable]):
                    formatted.append(r'%s \\' % ' & '.join(values))
                    if row in self._data_extras[subtable]:
                        formatted.append(self._data_extras[subtable][row])

                formatted.append(r'\bottomrule' if subtable == self.subtables - 1 else r'\midrule')

        return '\n'.join(formatted)

    def save(self, path):
        """Save the table to a file."""
        with open(path, 'w') as handle:
            handle.write(str(self))
mcket747econ commented 1 month ago

Hi Jeff,

Thank you for your response and for including your LaTeX class code. It actually is helpful for me and I will try to employ it for my own uses. As a quick follow-up, what are the data types for the arguments "row" and "values" that are used for the various methods? I seem to be running into an issue where the values I am supplying are not callable. Thanks!

Best Regards,

Matt


From: Jeff Gortmaker @.> Sent: Monday, May 13, 2024 2:11 PM To: jeffgortmaker/pyblp @.> Cc: Matthew McKetty @.>; Author @.> Subject: Re: [jeffgortmaker/pyblp] Question: Most Efficient Way of Producing LaTeX Tables of PyBLP Results and Estimates (Issue #158)

Unfortunately I won't be of much help here! Formatting results is beyond the scope of PyBLP.

In my own work (including for the above screenshot) I tend to just build tables custom instead of relying on other packages, which I've found don't really afford enough flexibility (and I always forget how to use their interfaces).

No idea if this is helpful, but here's a copy/paste of a quick class I threw together, iterations of which I've used for making custom tables with Python that are in the style I like.

class TableFormatter: """LaTeX table formatter."""

def __init__(self, subtables=1, longtable=False, longtable_continue_start=0):
    """Initialize empty cells."""
    assert not (subtables > 1 and longtable)
    self.subtables = subtables
    self.longtable = longtable
    self._longtable_continue_start = longtable_continue_start
    self._header_cells = [[] for _ in range(self.subtables)]
    self._data_cells = [[] for _ in range(self.subtables)]
    self._header_extras = [{} for _ in range(self.subtables)]
    self._data_extras = [{} for _ in range(self.subtables)]

def add_header_row(self, values, row=None, subtable=0):
    """Set a row in a header."""
    self._add_row(values, row, subtable, header=True)

def add_data_row(self, values, row=None, subtable=0):
    """Set a row."""
    self._add_row(values, row, subtable, header=False)

def add_header_line(self, row=None, subtable=0):
    """Add a midrule in a header."""
    self._insert_after_row(r'\midrule', row, subtable, header=True)

def add_data_line(self, row=None, subtable=0):
    """Add a midrule."""
    self._insert_after_row(r'\midrule', row, subtable, header=False)

def add_header_space(self, row=None, subtable=0):
    """Add vertical space in a header."""
    self._insert_after_row(r'\addlinespace', row, subtable, header=True)

def add_data_space(self, row=None, subtable=0):
    """Add vertical space."""
    self._insert_after_row(r'\addlinespace', row, subtable, header=False)

def add_header_page_break(self, row=None, subtable=0):
    """Add a page break in a header."""
    self._insert_after_row(r'\pagebreak', row, subtable, header=True)

def add_data_page_break(self, row=None, subtable=0):
    """Add vertical space."""
    self._insert_after_row(r'\pagebreak', row, subtable, header=False)

def add_header_underline(self, specs, row=None, subtable=0):
    """Add underlines in a header."""
    value = ''.join(r'\cmidrule(%s){%s-%s}' % s for s in specs)
    self._insert_after_row(value, row, subtable, header=True)

def add_data_underline(self, specs, row=None, subtable=0):
    """Add underlines."""
    value = ''.join(r'\cmidrule(%s){%s-%s}' % s for s in specs)
    self._insert_after_row(value, row, subtable, header=False)

# noinspection PyTypeChecker
def _add_row(self, values, row, subtable, header):
    """Add a row."""
    cells = (self._header_cells if header else self._data_cells)[subtable]

    if row is None:
        row = len(cells)

    for column, value in enumerate(values):
        # extend the cells if necessary
        while len(cells) < row + 1:
            cells.append([])
        while len(cells[row]) < column + 1:
            cells[row].append('')

        # identify multicolumns
        columns = 1
        alignment = None
        if isinstance(value, (list, tuple)) and isinstance(value[0], int):
            columns, alignment, value = value

        # join multivalued values into a stack
        if isinstance(value, (list, tuple)):
            value = '\\shortstack{%s}' % ' \\\\ '.join(str(v) for v in value)
        else:
            value = str(value)

        # construct multicolumns
        if columns > 1:
            value = r'\multicolumn{%s}{%s}{%s}' % (columns, alignment, value)

        cells[row][column] = value

def _insert_after_row(self, value, row, subtable, header):
    """Add extra content after a row."""
    cells = (self._header_cells if header else self._data_cells)[subtable]
    extras = self._header_extras if header else self._data_extras

    if row is None:
        row = len(cells) - 1

    extras[subtable][row] = value

def __str__(self):
    """Format the table."""
    formatted = [r'\toprule']
    if self.longtable:
        for first_head in [True, False]:
            if not first_head:
                formatted.extend([
                    r'\midrule',
                    r'\multicolumn{%s}{c}{Continued from the previous page.} \\' % len(self._data_cells[0][0]),
                    r'\midrule'
                ])

            for row, values in enumerate(self._header_cells[0]):
                if not first_head and row < self._longtable_continue_start:
                    continue
                formatted.append('%s \\\\' % ' & '.join(values))
                if row in self._header_extras[0]:
                    formatted.append(self._header_extras[0][row])

            formatted.append(r'\midrule')
            formatted.append(r'\endfirsthead' if first_head else r'\endhead')

        formatted.extend([
            r'\midrule',
            r'\multicolumn{%s}{c}{Continued on the next page.} \\' % len(self._data_cells[0][0]),
            r'\midrule',
            r'\endfoot',
            r'\endlastfoot'
        ])
        for row, values in enumerate(self._data_cells[0]):
            formatted.append(r'%s \\' % ' & '.join(values))
            if row in self._data_extras[0]:
                formatted.append(self._data_extras[0][row])

        formatted.append(r'\bottomrule')
    else:
        for subtable in range(self.subtables):
            for row, values in enumerate(self._header_cells[subtable]):
                formatted.append(r'%s \\' % ' & '.join(values))
                if row in self._header_extras[subtable]:
                    formatted.append(self._header_extras[subtable][row])

            formatted.append(r'\midrule')

            for row, values in enumerate(self._data_cells[subtable]):
                formatted.append(r'%s \\' % ' & '.join(values))
                if row in self._data_extras[subtable]:
                    formatted.append(self._data_extras[subtable][row])

            formatted.append(r'\bottomrule' if subtable == self.subtables - 1 else r'\midrule')

    return '\n'.join(formatted)

def save(self, path):
    """Save the table to a file."""
    with open(path, 'w') as handle:
        handle.write(str(self))

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/jeffgortmaker/pyblp/issues/158*issuecomment-2108613614__;Iw!!Mak6IKo!PBN_WzdMgprAQlvqtAK680llxaL53QgHTwDFX_9xF-ECWj2PwdE78WVWY1Vvt1DuiJ0KWQ2W3ItihRoMRiITQUqA$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AOGD54G6EMWX22RPD4LGWRLZCEF7PAVCNFSM6AAAAABHUXMPNWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBYGYYTGNRRGQ__;!!Mak6IKo!PBN_WzdMgprAQlvqtAK680llxaL53QgHTwDFX_9xF-ECWj2PwdE78WVWY1Vvt1DuiJ0KWQ2W3ItihRoMRslsdvri$. You are receiving this because you authored the thread.Message ID: @.***>

jeffgortmaker commented 1 month ago

The values argument is a list of strings (or tuples of strings to do a shortstack), one for each column, and if specified, the row argument is an integer index. I usually leave row unspecified, in which case it gets automatically built from top to bottom.

mcket747econ commented 1 month ago

Hi Jeff,

Excellent! Thank you so much. This is super helpful and will allow me to get my BLP estimates into tables far more quickly. I appreciate the help!

Best Regards,

Matt


From: Jeff Gortmaker @.> Sent: Tuesday, May 14, 2024 9:57 AM To: jeffgortmaker/pyblp @.> Cc: Matthew McKetty @.>; Author @.> Subject: Re: [jeffgortmaker/pyblp] Question: Most Efficient Way of Producing LaTeX Tables of PyBLP Results and Estimates (Issue #158)

The values argument is a list of strings (or tuples of strings to do a shortstack), one for each column, and if specified, the row argument is an integer index. I usually leave row unspecified, in which case it gets automatically built from top to bottom.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/jeffgortmaker/pyblp/issues/158*issuecomment-2110463380__;Iw!!Mak6IKo!KXURH9SZkPnWUQ0oL_xCELki_BoPP-q2swbpqgnHnnFtcD8HGwu0W-a4lxIvKm6Cvgkyg0Hvq1dp0Y2spMM6kmzo$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AOGD54FY72RRLX26VMQAMGTZCIQ5JAVCNFSM6AAAAABHUXMPNWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJQGQ3DGMZYGA__;!!Mak6IKo!KXURH9SZkPnWUQ0oL_xCELki_BoPP-q2swbpqgnHnnFtcD8HGwu0W-a4lxIvKm6Cvgkyg0Hvq1dp0Y2spBk-nkLA$. You are receiving this because you authored the thread.Message ID: @.***>